N)6 o e m 



PATENT 

Attorney Docket No. : 1 6869S-089800US 
Client Ref. No.: W1094-01 



IN THE UNITED STATES PATENT AND TRADEMARK OFFICE 



In re application of: 

KOJI SONODA et al. 
Application No . : 1 0/63 7,216 
Filed: August 8, 2003 
For: COMPUTER SYSTEM 
Customer No.: 20350 



Examiner: Unassigned 

Technology Center/ Art Unit: 2131 

Confirmation No.: 3324 

PETITION TO MAKE SPECIAL FOR 
NEW APPLICATION UNDER M.P.E.P. 
§ 708.02. VIII & 37 C.F.R. S 1.102(d) 



Commissioner for Patents 

P.O. Box 1450 

Alexandria, VA 22313-1450 

Sir: 

This is a petition to make special the above-identified application under MPEP 
§ 708.02, VIII & 37 C.F.R. § 1.102(d). The application has not received any examination by 
an Examiner. 



(a) The Commissioner is authorized to charge the petition fee of $130 
under 37 C.F.R. § 1.1 7(i) and any other fees associated with this paper to Deposit Account 
20-1430. 



08/10/8004 UfiBDELRl 00000075 E01430 10637216 
01 FC:1460 130.00 DA 



Page 1 of 5 



Appl. No. 10/637,216 
Petition to Make Special 



PATENT 



(b) All the claims are believed to be directed to a single invention. If the 
Office determines that all the claims presented are not obviously directed to a single 
invention, then Applicants will make an election without traverse as a prerequisite to the 
grant of special status. 

(c) Pre-examination searches were made of U.S. issued patents, including 
a classification search and a key word search. The classification search was conducted on or 
around June 29, 2004 covering Classes 707 (subclass 200), 709 (subclass 220), and 711 
(subclasses 100, 112, 1 14, and 156), by a professional search firm, Lacasse & Associates, 
LLC. The key word search was performed on the USPTO full-text database including 
published U.S. patent applications. The inventors further provided a reference considered 
most closely related to the subject matter of the present application (see references #6 below), 
which was cited in the Information Disclosure Statement filed with the application on August 
8, 2003.. 

(d) The following references, copies of which are attached herewith, are 
deemed most closely related to the subject matter encompassed by the claims: 

(1) U.S. Patent Publication No. 2001/0029507 Al; 

(2) U.S. Patent Publication No. 2003/0046369 Al ; 

(3) U.S. Patent Publication No. 2003/0225972 Al ; 

(4) U.S. Patent Publication No. 2004/0015520 Al ; 

(5) Japanese Patent Publication No. 2003-01593 1 ; and 

(6) John Kubiatowicz et al., "OceanStore: An Architecture for 
Global-Scale Persistent Storage," Proceedings of the Ninth International Conference on 
Architectural Support for Programming Languages and Operating Systems (ASPLOS 2000), 
November 2000, pp. 190-201. 

(e) Set forth below is a detailed discussion of references which points out 
with particularity how the claimed subject matter is distinguishable over the references. 



Page 2 of 5 



Appl. No. 10/637,216 
Petition to Make Special 



PATENT 



A. Claimed Embodiments of the Present Invention 

The claimed embodiments relate to technology for a dispersed type of 
computer system and, more particularly, to a computer system in which there is flexibility to 
add or delete file attributes. 

Independent claim 1 recites a computer system comprising a first storage 
device storing file management information, a first computer connected to the first storage 
device, a second computer connected to the first computer via a network, and a second 
storage device storing file data managed by the file management information, connected to 
the second computer. 

Independent claim 14 recites a method of managing data. The method 
comprises storing file management information in a first storage device; connecting a first 
computer to the first storage device; connecting a second computer to the first computer via a 
network; storing file data in a second storage device; connecting the second storage device to 
the second computer; and managing the file data stored in the second storage device using the 
file management information stored in the first storage device. 

One benefit that may be derived is the flexibility to add file attributes for each 
file and efficient management of access rights information for multiple users. Where a file is 
stored using a plurality of storage devices managed by servers dispersed in a plurality of sites, 
embodiments of the invention provide a computer system in which accounting information 
can be managed in respect of each site and each server. 

B. Discussion of the References 

None of the following references disclose or suggest a first storage device 
storing file management information and connected to a first computer; and a second storage 
device connected to a second computer which is connected to the first computer and storing 
file data managed by the file management information. 



Page 3 of 5 



Appl. No. 10/637,216 
Petition to Make Special 



PATENT 



1. U.S. Patent Publication No. 2001/0029507 Al 

This reference discloses a database-file link system and method therefor. A 
file server 20 and a database DB server 30 are connected through a communication network 
90. A content information file 244 retains attribute information of the DB managed contents. 
During database registration, an OS file accessing control information 245 (for DB managed 
contents 243) is updated by means of the File Management System (FMS) 242. The OS file 
accessing control information 245, DB managed contents 243, File Management System 242, 
OS 241, and contents information file 244 all reside in the file server 20. See Figs. 1-3; and 
paragraphs [0039], [0040], [0043], [0074], [0078], [0080], [0084], [0090], [0091], and 
[0093]. 

2. U.S. Patent Publication No. 2003/0046369 Al 

This reference relates to method and apparatus for initializing a new node in a 
network. A storage system (content repository) 1530 includes content provider's account 
information, assigned content management server, reserved storage, number of media files, 
and media file's attributes. A file metadata database within the storage system 1530 holds 
file metadata related to block files (block size, attributes, etc.). See Fig. 15; and paragraphs 
[0208], [0209], [0211], and [0215]. 

3. U.S. Patent Publication No. 2003/0225972 Al 

This reference discloses a storage system in which a file attribute control unit 
1331 and storage unit 143 executing a processing are linked together in response to a request 
from client computer(s) 1 la-1 lb. The host computer 13 executes a file attribute control 
program 1331 to add a particular attribute to a file. The storage unit 143 operates in response 
to the added attribute. See Figs. 1 and 4; and paragraphs [0033]-[0037], [0040], [0041], and 
[0049]-[0051]. 

4. U.S. Patent Publication No. 2004/0015520 Al 

This reference discloses database managing method and system having data 
backup function and associated programs. A database management program 10 has a data 
attribute changing module 140 for registering a data attribute or changing a data attribute 



Page 4 of 5 



Appl. No. 10/637,216 
Petition to Make Special 



PATENT 



stored in a data attribute table 35. See Figs. 1 and 4; and paragraphs [0015], [0018], and 
[0019]. 

5. Japanese Patent Publication No. 2003-015931 

This reference relates to information processing system and storage area 
providing method to provide a data storage system in consideration of the attributes 
(performance and cost). A plurality of storage devices have different attributes; and an 
attribute holding means is provided for holding information representing attribute of each 
storage device. 

6. John Kubiatowicz et al., "OceanStore: An Architecture for Global-Scale 
Persistent Storage," Proceedings of the Ninth International Conference on 
Architectural Support for Programming Languages and Operating Systems 
(ASPLOS 2000), November 2000. pp. 190-201. 

This reference discloses a utility infrastructure designed to span the globe and 
provide continuous access to persistent information. Because the infrastructure is comprised 
of untrusted servers, data is protected through redundancy and cryptographic techniques. To 
improve performance, data is allowed to be cached anywhere, anytime. Monitoring of usage 
patterns allows adaptation to regional outages and denial of service attacks. Monitoring also 
enhances performance through proactive movement of data. 

(f) In view of this petition, the Examiner is respectfully requested to issue 
a first Office Action at an early date. 



Respectfully submitted, 




Chun-Pok Leung 
Reg. No. 41,405 

TOWNSEND and TOWNSEND and CREW LLP 

Two Embarcadero Center, 8 th Floor 

San Francisco, California 941 1 1-3834 

Tel: 650-326-2400 

Fax:415-576-0300 

Attachments 

RL:rl 

60272139 v1 



Page 5 of 5 




PTO/SB/17(10-03) 



TRANSMITTAL 
for FY 2004 

1/2003. Patent fees are subject to annual revision. 



Complete if Known 



Application Number 



Filing Date 



First Named Inventor 



10/637,216 



August 8, 2003 



SQNODA, Koji 



nt claims small entity status. See 37 CFR 1 .27 



Examiner Name 



Unassigned 



Art Unit 



2131 



TOTAL AMOUNT OF PAYMENT ($) 130.00 



Attorney Docket No. 16869S-089800US 



METHOD OF PAYMENT (check all that apply) 



FEE CALCULATION (continued) 



I I Check Credit Card LZ| Money Order O Other O None 
3 Deposit Account] 



3. ADDITIONAL FEES 



Deposit 
Account 
Number 



Deposit 
Account 
Name 



20-1430 



Large 



Enf 



Townsend and Townsend and Crew LLP 



The Director is authorized to: (check all that apply) 

] Charge fee(s) indicated below |3 Credit any overpayments 

] Charge any additional fee(s) or any underpayment of fee(s) 

I I Charge fee(s) indicated below, except for the filing fee 
to the above-identified deposit account. 



Fee 
Code 

1051 
1052 

1053 
1812 
1804 

1805 



itlty 



FEE CALCULATION 



1251 
~252 



1. BASIC FILING FEE 



Large Entity 


Small Entity 


Fee 


Fee 


Fee 


Fee 


Fee Description 


Code 


($) 


Code 


($) 




1001 


770 


2001 


385 


Utility filing fee 


1002 


340 


2002 


170 


Design filing fee 


1003 


530 


2003 


265 


Plant filing fee 


1004 


770 


2004 


385 


Reissue filing fee 


1005 


160 


2005 


80 


Provisional filing fee 



Fee Paid 



SUBTOTAL (1) 



|($)0.00 



2. EXTRA CLAIM FEES FOR UTILITY AND REISSUE 



Extra Claims 



Fee from 
below 



Fee Paid 



Total Claims 

Independent 
Claims 

Multiple 
Dependent 



Large Entity Small Entity 



Fee 


Fee 


Fee 


Fee 


Code 


<$> 


Code 


($) 


1202 


18 


2202 


9 


1201 


86 


2201 


43 


1203 


290 


2203 


145 


1204 


86 


2204 


43 


1205 


18 


2205 


9 



Fee Description 

Claims in excess of 20 
Independent claims in excess of 3 
Multiple dependent claim, if not paid 
** Reissue independent claims 

over original patent 
** Reissue claims in excess of 20 

and over original patent 



1253 
1254 

1255 
1401 
1402 
1403 

1451 

1452 
1453 
^501 
1502 
1503 
1460 
1807 

1806 

8021 

1809 
1810 

1801 
1802 



Fee ($) 

130 
50 

130 

2,520 

920* 

1 ,840* 

110 
420 

950 
1,480 

2,010 
330 
330 
290 

1,510 

110 

1,330 

1,330 

480 

640 

130 

50 

180 

40 

770 
770 

770 
900 



Code Fee <*> 

2051 65 

2052 25 

1053 130 

1812 2,520 

1804 920* 

1805 1,840* 

2251 55 

2252 210 

2253 475 

2254 740 



Small Entity 



2255 
2401 



1,005 
165 



2402 165 

2403 145 

1451 1,510 

2452 55 

2453 665 

2501 665 

2502 240 

2503 320 
1460 130 
1807 50 

1806 180 

8021 40 

2809 385 

2810 385 

2801 385 

1802 900 



Fee Description 

Surcharge - late filing fee or oath 

Surcharge - late provisional filing fee or 
cover sheet. 

Non-English specification 

For filing a request for reexamination 

Requesting publication of SIR prior to 
Examiner action 

Requesting publication of SIR after 
Examiner action 

Extension for reply within first month 
Extension for reply within second month 

Extension for reply within third month 
Extension for reply within fourth month 

Extension for reply within fifth month 
Notice of Appeal 

Filing a brief in support of an appeal 

Request for oral hearing 

Petition to institute a public use 
proceeding 

Petition to revive - unavoidable 
Petition to revive - unintentional 
Utility issue fee (or reissue) 
Design issue fee 
Plant issue fee 

Petitions to the Commissioner 

Petitions related to provisional 
applications 

Submission of Information Disclosure 
Stmt 

Recording each patent assignment per 
property (times number of properties) 

Filing a submission after final rejection 
(37 CFR§ 1.129(a)) 
For each additional invention to be 
examined (37 CFR § 1.129(b)) 

Request for Continued Examination 
(RCE) 

Request for expedited examination 
of a design application 



Fee 
Paid 



SUBTOTAL (2) |($)0.00 



"or number previously paid, if greater; For Reissues, see above 



Other fee (specify) 



130 



•Reduced by Basic Filing Fee Paid SUBTOTAL (3) 



($)130.00 



SUBMITTED BY 



Name (Print/Type) 


Chun-Pok Leung 


Registration No. (Attorney/Agent) 


41,405 


Telephone 


650-326-2400 


Signature 




Date 


August 6, 2004 



60278906 v1 



WARNING: Information on this form may become public. Credit card information should not be 
included on this form. Provide credit card information and authorization on PTO-2038. 



/Searching PAJ 



Page i of 2 



PATENT ABSTRACTS OF JAPAN 

(1 ^Publication number : 2003-015931 
(43)Date of publication of application : 17.01.2003 

(51)lnt.GI. G06F 12/00 

(21) Application number : 2001-200455 (71)Applicant : HITACHI LTD 

(22) Date of filing : 02.07.2001 (72)lnventor : KANEDA TAISUKE 

ARAKAWATAKASHI 
EGUCHI KENTETSU 
MOGI KAZUHIKO 
ARAI HIROHARU 



(54) INFORMATION PROCESSING SYSTEM AND STORAGE AREA PROVIDING METHOD 

(57)Abstract: Ifet^ 
PROBLEM TO BE SOLVED: To provide a data storage 
system in consideration of the attributes (performance 
and cost), the operating rate and the data usage 
frequency of the storage device in a storage area 
network constituted of a plurality of the storage devices 
that have different attributes and are used in a business 
pattern of a storage service provider or the like. 
SOLUTION: The information processing system is 
provided with a location management means for 
managing the location of data stored in the storage 
device, an information replication means for replicating 
data between the storage devices, and an attribute 
holding means for holding information representing the 
attribute of each storage device. The location 
management means is so constituted as to use the information replication means to replicate 
and move the data between the storage devices based on the operating rate of, the data 
usage status of, control information on and the accounting information on the storage device. 



&0 




http://wwwl9.ipdl.jpo.go.jp/PA 1 /result/detail/main/wAAAnEaioODA4 1 50 1 593 1 P 1 1 .htm 6/29/2004 



Page 2 of 2 



(legal STATUS 
[Date of request for examination] 

[Date of sending the examiner's decision of 
rejection] 

[Kind of final disposal of application other than 
the examiner's decision of rejection or 
application converted registration] 

[Date of final disposal for application] 

[Patent number] 

[Date 6f registration] 

[Number of appeal against examiner's 
decision of rejection] 

[Date of requesting appeal against examiner's 
decision of rejection] 

[Date of extinction of right] 

Copyright (Q); 1998,2003 Japan Patent Office 



http://wwwl 9.ipdl jpo.go.jp/PA 1 /result/detail/main/wAA AnEaioOD A4 1 50 1 593 1 P 1 1 .htm 



6/29/2004 



mnmrntf up) (12) & |fj 4rF < A > avmtm'mm 

#^2003-15931 
(P2003-15931A) 



(43yjfflB ¥j£l5i£l 3170(2003. 1.17) 



(51) IntCL 7 




FI 


f-?3-r(##) 


G 0 6 F 12/00 


5 4 5 


G0 6F 12/00 54 5A 5B0 8 2 




5 2 0 




5 2 OE 








52 0 J 






tern* 


W«S©»16 OL (£27K) 




&BI2001 — 200455C P2001 —200455) 


(7i)taiaA 


000005108 










\£f&/ mm i-j 


5&fi£13$E7 M 2 B (2001 7 2) 




















#^il|»;i|«W#*KS#^1099#S!! «c 
















WW ®$L 






















(74)ttSA 


100098954 



















(54) wmo&im ®mimi'*Tk%&Tmwmmc)j& 

(57) 



A/ 




N 




A/ 


70 


70 


70 


70 


70 




80 /" 80 / v 80 
201 202 



-1- 



[«fM»#©«Hl 

Kit, MCttSffa^SH:, WEIIttfiM*#a©fflf« 
ri. 



t, «9BB«S6BfW»ti!fa^#S;J:OK«tfc®J«ifl!f 
[19**1 o] M#K l KB*©* W&m*>x7*tcts 

fneiutfitM&ftNt. flr«o»»K*»*»«=»^h 

[HI^9C1 1] l»*3ll OKBHoilMWra^yAlc 

*»*t, strBficsta^att, mbb«^b 

[f&#Ei 2] B»#*lfcB#©»«&«i'*7 l Afc:*i 
v*T» MBffiS^S^ai*. MBWffll^a!i»6>#flB$H 

-r 5 r. t zimt ■r^mim^T a„ 

[8***1 3] BNtqilKEilOflMRJQai/^^AKjB 

[nt** 1 4 ] it** i Kmm^nmmu^ ak* 
^-c, ^B«*as!#ai4, mkTtv-omm*. s 

W 3 BWWHMWSfife-e fcot, 

ttEMfPff«ic« CfcJ^tt^RfolBtiiSSI-MBT'-^ 

[»** i e i Ettas«wE«««*a*©a— ^ Kit 



-2- 



=>^ht, ]|5Ex-*£&#Lfc$I!8fc£C-C, S&IEa- 
— f^oWtetZfift 5 *x y ^fc ZffiiLtcZ t t 

[0 0 0 1] 

fc.fcT/X-T'y^yP&S-^fcxV**^ 

xx!)£Eg/.tJf) ORIA&ffi&HU »te, «*©Jlfc 
5fttt£toEt§gg£«fc©8iiJ®gB (=^faL- 

[0 0 0 2] 

[0 0 0 3] JLE#ff"m> »©^>f^-;?^fj^ 

vvr©E$a s fcv\ 

[0 0 0 4] rO^iC-OV^TI*. Tivoli Systems Inc. 
ni^V 4 h"<— /<— Tvision, Tivoli Storage Managem 
ent Solutions for the Information Gridj Ic. X Y- U 
-^!)7*y hy-^tUol^-C, Mtt©g&5* M' 
-^£M£v>WCx-* *rfiU*i-5tt« *«^* Jirv^ 

[0 0 0 5] ^hV-^f-^T'D/My (SS 

P:Storage Service Provider) fcB^lfJXS t^i^.****) 
5. XbU^t-^o^^li, «*©£#«::* 

i^-t*;*;7'n/^ ; m£fl;©ta£fr;Ui<}:^ 
fdttT*tt^<, «g©jjt».5ttffi*»fI8tt, =>*K ft 

too. ;* r u-i/\zfrfr2>m&<g a=* h*H"J«U4 



[0 00 6] 

B8«jWII*LJ:5J:1-aMH] *!6^©SWtt, fct 

(Ett£«) X®f$.£HZ>X h U-iSx. })T*yh V- 
:»3£tfxW©*Jffl*^#ltLfcxW©&#;*f 
[0 00 7] 

[3&®£ft?fc1-£;fc©©^®] ±EBWSrS^t-5fcA 
IC #3§?Utt, E«gttKfciW£;ft*:x , -*©&B£f : 
ai-6ttfita#©i:, Ett&g|g|-ex-*©mK&*T 

fiWLfc«tt*«fc*-3&, &B£a#Rl*x-*©S 

[0 0 0 8] *fc, &S£a#Rfctt, =yf a -«> 
P,iJM7r>fM-A (flW©#fe»JJr**-t-flHll 

•am K#&r3x-?©&giff#&'>#< tt>-o 

wawasrRrt, ^gfa¥^ttrcD^#i^4tt 

«fifct Lit. z<D&®®m&®t3.&x h u-vm-wt 
[ooio] ttiti^SKtt. nyfi-?* 

[ooii] *fc, ttwra^Rfci*, =ve a -^i> 

Jx»cj;t» s ftgfa^Rtt. ^Fu-^oSi$, -r- 

[0 0 12] firafa^RlwS. n^t^-^A^ 

<»>f>V--7 I DKgo'fx'-:? io&xm®i*'K 
itZffil&b Lit. 



-3- 



[00131 £fc. nyifa-?i;ti, mma&&# 

[0 0 14] 

[359f©0&lfe©J&«] £*T, igffi£ffivv-C#389!<DS&l 

[00151B1B.*1 ©jusj&sg©^^ hMim 
£^-f 0 AS, 3-&<D3^tf3.— * 1 0 1, 1 

0 2, 151i, 3^0T-f **7l"i'3£B2 0 1, 2 
0 2, 203i, 7^y7!)gl3 0 1t, 

o l fc£B;tTV*.5. 
01, 102, 1 5 1 fc, x^^^T 0 1 , 
202, 203t, 7^^71)^63 0 114, 
^■■r*/U5 0ft^-UT, 50 
lK«»f5. 77'f^ft^'f?f5 0 1li, SI 
«3ftfc#B«B©^*©»ic*#9#it*f?7. * 
t nyifa-^lOl, 102, 15 114, -f-f- 
* V Y 6 0 «rffl^T«ZK:8SIM-$. 
[0 0 l 61 ft*s, ffio£jBm«>BK:JI!2*J:0)|l 
3©BB»B-fcovvtRWi-S#» S&2*S4tfSS3©* 
^ffiro^r a t>JK l ©itlBgft© s/xy a t na*ai© 
*fi£-e$>5©-eia i ^fflv^rmw-rSo Eifcfc^-c, 

aVtfa-^10 1, 1 0 2f <0*-YS'i/af f -f^^9 

ottfif«s«*a9i, ffiwc, aytrwioi, 

10 2,7^^7^8120 1, 202, 203, 
* 4l*7V 7* ? !) 8B 3 0 1 4>4>ttflk*tf-M¥£ 8 2 fc; 

ov^rtt, ®2<D^jg^Bro^^7 1 Aic@^r»fi}fig-eS) 

*r©«j*T-$>5©-ci& i *$ ±rf» 2 ©3afi»«-ctt#s 
Utv\ 

[0 0 1 71 Ell ICteVN-C, 7r4'*-f'*r*>\'§0*m 
^xmmJt^v M7— ^tt— jRttK SAN (Storage A 
rea Network) tuftfjx, h 6 0 £fflV>-C« 

J* Life* y r- 17-7 14— BttC LAN (Local Area Netw 
ork) i: 01TI4, 77^^f t^50M 
— jr*y f-6 0©IO(D^y h!7-^SrffiV>-C«fi$;Lfc 
m^l4, f&©*7 M7-*-Ct>.k^U ^-ir* 
y h 6 0©«IBS:7T'f^7-r^5 O-eWf^toU-r-5 

[00 181 3^ye a -^oH, ^yfa-^ 

ism, ^-^©teewa^as o i ktfmmm*® 

8 9 0SM5. tfcB£a¥B8 0 1 i flMMUK^B 8 

9 o©!$fflico^-c(4&ai-r.5 <> 

[0 0 19] ftlc, T^^7W8ItMW¥a 
fco^TRWfS. 7^7.7TW;gEB(4, 
&©xn— Kr-f *7$lg£WU RA I D (Redundant A 
rrayof Inexpensive Disks)t£$ffl^4 9 , ttstlS^LFfls 
i|gtt©&#£I2ofcie«^fiT*;fe5,, f^^7Mi 
■14, 7 J 4**7W3SB£«jS1-5^-K? f -f**36 



B©&|g, &$c, *54T*RA I D 1"<MC& QZ<D&m 
<JlftoT<£„ fciM, RAID0 
k umftS 7-7 £«|fc©^- K7V 7. 7 fc 

^Sfc-rsri:T-tttg|4S5<*5>}S, RAlD0«t 
38B©^- K7 -f 7 7© 5 *> 1 is-Vhtkm-tZby*- 
**&0fe ] tm.ft1hZ>o RA I D 5 iPsfffftS W 

^/H4, 7-7 Srfflfc©^- K9» * * fc£SH-* i 

•tSrt-C, RAID5«r#lJ?M-5«&©^-K7'-i7> 

7©5*>l£a5&l*LTt>, 7*-*&«7GWffiftffi!Btt 

SrSPl/CV^S. RA I D 5(4, RA I D 0 (Ctt^T^ 

ii^©tt!6^[ST-t-5. 4fc, RAI DO*54U!RA 
ID5it>, «j^5/>-KtV**KB©£&I;:4 9 
*©ttffi#Mft.5. 

[0 0 2 01 *!U&Bffi-ci4, ^^^7^812 0 

1 1 2 0 2-SrRA I DO-ei^L, ^-i *77W 3£B 

2 0 3£RAI D5t?«^-t-5t©it-5o ^rtl^ft© 
r-fX^H'SSilctt, MteffifNMasoSrBttfco 
7*^^7^812 01t20 2©fttt«#^&8 0 
fi, ^-KTW*^£Efi-C«J&£*t-5RAI DO-CfcS 
it, liM^S&fc *>©«#=>* K *rirt-5/N-Kf» 

-<x?mmt}&. ^-K7V**asB©tt« (bmesu 

-(SI2 0 3 ©ftttSif 8 014, /n- KT* 7 7 m 
BT-«$£*l5RAI D5t?fe-5-i, ¥££§4fofc?> 
©«#=»* h, «j«-r5'>-K7V*7«Bfcft. /v- 
h'fjxtmwa&m (EHBfc, i^-^ftW, 
4« SrfiHWS. 4 or, f5rgfS#JiS:8 0 1(4, ft 
ttB#¥B8oa»br:|xb©ftttflra*»9fflttl4\ * 

©7^nN &b# ir© 4 5 *b« t mm&zm-r 
«tt©B«*s*a*u\ fct^rt, ^bsisijt^ 

B^«i5fe^ $ ttTV > S p< * V 4rfflv^T fttt<*^¥a 8 

ttTV>51f«©^tiiL-(CI4, SCSI (SmallComputer S 
ystem Interface) "CSeE^ttTV 5 INQUIRY^-7 
>WMODE SENSE=-r^K**lJffl-f5o 
[002 11 jfcK, 5^^?P*Bt*ttfiy*#BKo 
*HJg^(w*3^T, 7^7"7!)|?t3 
01(4, 2^ODVD-RAMK7-f7'3 0t, p»7>f 

rsr*&tt"5rte*4fi©B3 1 fc, ^7^7*^73 

OkfflS 1 i©R5T'«R^-r5lo©«Rj§»«3 2^W-f 

«B©^, «»««©»* £fctt#Lfcv\ 
*^07l(4, ^7 y«B*»V^Ttfll«t?t 

So ^mmmmx-^ 4W3Hjifw:i«:fo 

DVD-RAM^f-f7&Mt5t©i:t5. 7-f7* 
7!)8i3 0lli, B2S««3 2«:fflvvt, ^ytv- 



-4- 



[0 0 2 2] ?4?y})gi&30 llcteittStifc^x-f 

^-9 ft, *.?yj?yvmm^ m&tc\,^-*<D&, 
misfix^z*7jT*mfrbYyj7\cmk-fz>*io 

b<Dmm7v>m / £*&<>x, yjj? 301^0 

k, YyJ7\zte^X*TjTftWi&ML^m\zftZ>t. 

x<n$ff$ (±K*7jTtm*&L^m*®fcmz»r 

2 0 2, 2 0 3*^f-^g^ffiLl:H^WI^ 
[0 0 2 3]^»im y-(zfyVmW3 0 1\C 

m&&n^&8on. dvd-ramk7^^t«j 

Oft. y4 7yVmW3 0 1ftKhttn&i!£<DUm^ 
[0 0 2 4] ttaffa¥Bi:*lft*l5:ov>TRW 
[0 0 2 5] — 9 15 111 &WgM^Wl8 0 

urn**. jM*ttfc», ttftaft8oin 

UTfc^o ffig^»?J£8 0 1 ft, 1 0 1 

J S'10 2&ttLAN6 0£:#LT««i-3o iIf§cofc#> 

fct*.tfTCP^UDPftif*3K]ffl-C#5. ftgfa^ 
880 1*1 ^yt°a^l0 1^10 2H77^ 

A£gtt&£<!:, t07r>fM-A07r^^ 
yfa^isiM&t^ z<Dtc#> s yrJ/is*— 

[0 0 2 6] B2(l, «lft*8 1 lO^^-To 

$81 in h^yfa^ 1 5 icD^^eyicft^ 

£tl5o £**8 1 ltl 10©77^^A»l 

Hif«, &£xfyTJ*®m<o&y4-/i>Y&ffiz-Z 
an. e«3s«#-j§\ ^r^r#^ lb 

A (Logical Block Address), *5 £0^— ^g^o: 



WN (World Wide Name) Sr^ffl.b^o WWNfc 77^ 

*JllfeJgffi"Cri, V^^TWIil2 01 OWN £ 
T201J , f^f^^7W8t2 0 2(DWWN4: T2 
02J , ^^^7WS5I2 0 3<0WWNJ T20 
3J . 7-f^7y^l3 0 1OWWN4r T301J £ L 

fco *^r#*it* mtttezmfemmfcy'fyy}) 
^t^tos^ (*tzftm<o&%) LB A 

-&fctt— 5 1 2/M hOr-fZ&jfoXZs DV 
D-RAM^x^Tc0^iCH-TO(-2 0 4 8'U h 
<D7*-9t:temxZ% 0 S61/ny?7K^tfi, - 

[0 0 2 7] fcfc, ffiB§a^St««^3l*^oVNT 

kw-ts. ftawa^S8 o lien ttiftraxs 1 2 

fl$8 12(t r^^^7Kgf 2 0 1, 2 0 2, 
2 0 3Og^lC^^^7W86ft^ 7^^7!) 

Sf3 0 lot^ictt^^rflHcjasti-s. SWOT 

$8 1 2H ^<D^M^fc^^x>rTlc:*3V^^^a 

ttfc^^i^o ^ ^ inf-^#*> 9 , roj <dw& 
\a**tiimxhZ>Z k Sr^i-o ««31$8 1 2H 3 

yifoL-^ 1 5 i<&^y tefcttSH*. 

[0 0 2 8] &tc, »ffi»#£*feat::o^TRW-r 

ayfa-^ 1 0 1 fcfcffl-rsa-f 1 **, y-^xxtm 
«#-T5»^*ific«si-5 (5 0 0 0) Q 

[0 0 2 9] ^yta-^ 1 0 1^77^f^^fAl 

aaj *^57^fc^s«»fl^J9?!iT«\ 3^ 
tr*— 1 5 1 o&ffi1fS#IS:8 0 1 (5 0 

0 1) o r^^, ^yt^^ 1 0 1 It, y-'—f FA 
AAJ (x — 9 TAAAJ <bl^, 7 7 4Jl>*—& TAA 

Aj (07 7 4 tw&fty*— 9 XihZ) Of— 9&\zM3l 

X. 7 s — 9 TAAAj ^ifC0j:5^ff: : & : UfcV>^^^:-r 

ft. r^iitcj , r^ft^icj . rgjffltcj % r^«^ 



-5- 



[oo3o] ftg^a^as o 1 3nr&&m8 1 1 

d^Sv^TV^a-ffSrfilUfflL (5 0 0 2) , y^htf 
yhfc rij (CUT (5 0 0 3) *ftii*©IB»«rSa' 
U 7r^/^-A^t^tP (5 0 0 4) o ICttt 
3E»*8 1101#@ (#1) 0>&B1f«£««bfc£ 
-TSo «g«WS^8 1 2*»fe, 5*-* TAAAj fcteifft 
+5©fcjKilft««SrtftSIH"S (5 0 0 5) o rr-C 

(5 0 0 6, 5 0 0 7) , tegWS^&S 0 1 ft, ^ 

5 (5 0 0 8) o 8 0 1 Wt. a^tTa-* 

3i$<D7 ! '(X*Tl"(mW<D t Pfrbs RAID5-C 
«*S*tfcf f -f^^TWi6«2 0 3«r3WlL (5 0 0 

9) , 4»fr*-£«*ttBS8 1 2^LBA^J^T^ 
*fc*Xb (5 0 10) , {fl«flH»*3 i o 

lfc#flft"5 (5011). 

[0 0 3 1] aybr a -^10HD7r>f^fAl 

W 2 0 3fc3Mfb (5 0 12) , T f -^$r##aitf 
(5014) 0 Z-<Dy4 ba^^K^SBfi^St&trr 

>t°^-^ l o 1 (D7r-f^rAi oii, -r^*^ 
7Wgl2 0 3^^^7OSe (5 0 15) 

H:tSaW^7lfcr^?:a§t5 (5 0 16) e 
[0 0 3 2] &fifS#J£8 0 lit, **&*^T<0« 

ttl#B<D{fcBtiH8fc. 7>f^^TKSi2 0 3OT 
WN, LBA, fcilf^— (5 0 1 
7) , #8Slfy h£ ri j \z±y hi"£ (5 0 18) r. 
fc-C, 1 f I ©ttHIW#*SJ-C*5 i i Sr*t, 
*g\ f-^ >f ^ ^ 7 Kgllci^^^r% 

/i/JBlfctt, 3R«ffrtt*©0l*«r»*atr (5 0 1 
9) 0 y4 YVv r 0 j fcbT (502 

o) , »ta*«r*Ti-* (5021) o 77^;^ 

fAl ott, filffl¥S8 0 5&*^T<0« 
^-fr-fh (5 0 2 2) 0 rftldJ;*), 7^!)^>gy 

frbv>mmxm$km*%T-tz> (5 0 2 3) . 

[0 0 3 3] i 5 l^f 7 n t " h y 7" 
5 0 0 6fc*5Vvt, »J9 ST^Iffi*««d5^ofc»& 
(Ctt, S^i&#»att*B^*S (5 0 9 8, 5 0 9 
9) . $^1775^/5 o o 6-CSI!? StWj:®8 



KhZtK «ya^y^5 0 0 7Ta»©E«BSBlc« 
##*ia>o ^a^r^S 0 l O^tPo 

[oo3 4] K^abaafcov^-cRw-rs. b 
i o i fcfcjBra^ira. v-yutzwryv 

fB>*m*X. ffif@«#bT^VNfc3:»SrSK^ii* 

+s«*44rK«ai-s (6000) . 

[00 3 5] 3 >tf i — ^ 101O7 7^/^7A 1 
Ott, -r-*^^ h6 OSr^bT, 7 7^f/^ TA 

a aj ©^-^«)B*wbSrtt«ffa^a8oifcB* 

t5 (6 00 1). ^Mta^©8 0 ltt. £&*8 1 
l^?>7r^^*-A TAAAJ fc*fffi-rs*T«raiRb 
(6002) . h try has roj -c*>5r.t*«« 

1-5 (6 0 0 3) o tb9>f hbTy h^ r 1 j <D#g*fc 

<5r— ; 1Ff—9 TAAAj *%m*¥Xh 
5©T?K*ffib£1^5ria^£l/> (60 98, 6 0 
9 9) . 

[0 0 3 6] W:, — KJpvUfc'f ^ !> ^Vh 
(UP*), b (6 0 0 4) , TAAAJ (DAot 

V^fiUfWtay^-^l 0 lld^-TS (6 0 0 

7) o t*-* TAAAj fc*fjSi--54fc«*aSrlobd^ 
fcotv^v^KB, *©tt«fflf«Sr«ftr*b** 

i/tr-a— * i o lK&mft athztt* zzfrt>m&m 
*"©#*t>av^4if) ttwiHR&attb (eoo 
e) % aei-a (6007) . ±e**»# 

i^T'S^&A/fcf^ TAAAj ©fiH(f«t*)St 

[0 0 3 7] 3y^-flOKD77^f/^7Al 
Ott, #B1f& (WWN X LB A. ^-*fi) (Dm^Z 
%VZ>t. 7^^7Wg|2 0 3C!)-K^yK 
£3§?Tb (6 0 0 8) . x-^SrK^ffli" (6 0 0 
9) o r<7) y - K^^> Kc0^fif<b y - Yy*-*<D&m 

1 0 lc7)^7-</^>^xAl 0^, 7^f^^7Kgi 

2 0 3^5>^tHb^70a^ (6 0 10) $r§ttK5 
^r, ^>-t: 0 ^-^l 5 10ttiffl#a8 0 1 *cK*ffl 

. U^JLfc^^^tS (6011). 
[0 03 8] fegtS^S 8 0 1 {4, gg^ffl L^T<om 

(6 0 12) . V-h+Xt^hZTtV h 

(i«») i-<5 (6013). y-KJS^htfo-e* 
fSt^— ^ taaaj ^mt-x^^^zt^ 
b, o^ftWtbtfiKtt^Wb^-c&srt^b^c 
77^/^7Ai of4. ffittfa^as o i*»fcgiE* 
tbb^roas (6oi4) *rgtf*at, ryytr- 
vic»*fflb©^7Sr«e-r* (6oi5) e m 

lcj;«9, 7 7 p y^-^3>'jii^«SIWtllU^7t 



-6- 



3(6016). 

[0039] ®mm®<D%mco^x®.m-rZo 

a— tf(4, =• Vf ^-^ 10 1^10 2±irffi«Ufc3. 

jA,i/x<rJ* l Oii, &BS t 3#g8 o l fcSJUHfB© 
1 1 i.*)?-9 TAAAj 0mmtt*K&HL 77 

-OW/*7\6.l o^m^-tZo 7r^;v-WAi o 
(4, ««W«*Sit»5i:, a-y-f yy-ffc»LTH 

•MNiMMrt-a. irm <4Blfs#gi<8 o i 

3. 

[0 0 4 0] Bfc, W Wffflcj oMMIMft* 
rffiffi&tcj fciESi-st, 77-f;wfAi o(4, 
bb»b*»8 o i KMMff «©##&*£g*i-3. 

BB*B#»8 0 1 (4, £&3I8 1 1 Or-? TAA 
Aj K»lSr5*MWt««r rfi«ft|Cj ££XU g-JE 

[oo4i] wffa^as o i isemt^ 

ttfcOV>TISiWt*5. 0 7 1 118(4, ffigWS¥K8 0 

f-T— ht?fc5. rrfli 7— r A A AJ 7 
^7Kgl20 3*>f)7-f77ySl3 0 1 \C&W)-f 
*«KoV^TRW-f5. &Sfa#S8 0 lit, -£© 
^It?ftS8 1 l4r»9iKU»^LTV>S (7 0 0 

o) . £iiif. f&i»B«& rneffidj a»e 

fcj -s3EHt/C7-^ taaaj icovvft, c©-£ 

mm<omm-e£%.i)m&zth5 hood , itrnwrn 

B^FirSrltlSEt (7 0 0 2) , -JgJRPfl (tctx.\f90 
Bkfr) go-CV^5d^S : ^Srf 1 3iy^i--5 (7 0 0 

3) . -mmmm-ox^itm^m, ®W)tt&xhz>h 

©tUT, 7—9 TAAAJ £49 fc*7 h=** h©£l^ 

ibb&bi^ih-s. ***»«rr*, T^nM 

£?B2 0 1 i: 2 0 2<D&ft=*x [\0\ , 7-f*7 
7^^g2 0 3®ffij|3^h* r l 5 J , 7-C77y 

£B3 o i©#8=>7 h£ r i j k-tzh. teats* 

&8 0 1I4, y^rfy 0 Hzt-9 Taaaj 

ZtefhtZ Z t X-y- 9 ©{£&=> 7 h©(6«£l!5. 
[0 0 4 2] ZZX* &Wgm^Wc8 0 U4, ^m^8 
1 1 -c, Mr 7 h*s r o j t? ( 7 o o 4 ) , ;> - k 

roj T*3b5i k&m&L (7 0 0 5) , 7 

^ ht*7 hSr ri j jcr^u (7006) , mzm& f g 
a^8i 2 4r#p,L7-r77 o lco^^^r^ 

f>7-? TAAAJ «r*Mfrf 



S. #2a&Jgllfi©#&H\ ®tt«Rf¥S8 0(DJStt^ 
*S^b (7 0 0 7) , tWflffcfrfcifcfWifttta 
*h©J:9«^e*SE«fc*RL (7008), $r$g 

(7 0 0 9) . 
[0043)7-? TAAAj 

Jlo*»5 i (7 0 10) , tegfa*I!:8 0 1 tt, afcS 
■*5g«6tS3l8 1 2©LBAft«9'S-C»*fc«Xt 
(7 0 11) , fflf«&§8#&8 9 0fcf*-* TAAAJ 

©$ss&s#i-3 (8000). *nM%mxte, urn 

«£!¥S8 9 0£, ayf a-? 1 5 lfcKttTV* 
ik 77-1^^^ -7f50 1 icRftTki^ 

u 77'f^t^'f -7^5 o i izmmztizmo 

StS§lJ:S»7-Ct>«tv\ 

[0044] 08ii, flttt*Sl?ft8 9 0ti:.ts%H& 
m*>*M*»t. flHRttims 90 *1\ 7-T7 

7y^@3 0 1lC»LT, ffitWbflJt-Ojfcofc*^ 

t% F7-r73 oic^-rsj; o\mmfr$z%fi-rz 

(8002). 1-eiC^©7:7 ? ^T^K7'r7'l-fc5 

7:7WTS:«6illi-5i£Jg«^^ (8 0 0 
1) . «B3@lt8«3 2tt, mfciStltcty'jTZ, Ztl& 

(8 0 0 3, 8 0 0 4), ffi®.m$&W>8 9 0 tt, 

(8005). ifcfc, if«fi#©8 9 ott, -r-etT* 

-? Taaaj dSStiiSix-Ci/^T'^^^T WSES 
20 3^LTU-K=>^^KSr^fTU (8 006) , 
7-*&lc^UiL (8 0 0 7, 8 0 0 8) , J&@§£ixfc 

L (8 0 0 9) , 7-^SrStiitf (8 0 1 0, 8 0 1 
1) . 1f«*»#S8 9 0(4, 7-f 1 h^-^yK^Tt 
5t, ttlffl¥S80 lt««^TUfcrt*iS 
t5 (8012) . 
[0045] 7 djgoT, egta^© 8 0 1 

14, mSS5S7©$R£££»7&Si, «^8 1 1©^V^ 
TV^fT, ri-C(42SS©fittt»ffilC, 7-f/7!)S 
H3 0 1©WWN, ^7^T##, LB A, HiHtf— 
?ftS:»#ii^ (7 0 13) , hSr rij (c-fe 

7 (7014). jDctC, (ftgffa^S 01IJ, 

i#aoitiniMi©#»trjy hSr roj ic-rsrt-c 

(7015), l#i©fiC«fflf«SrSS^lcU, 1#B© 
tt«IMIte*W&-*-3«f»a*8 1 2»LBA!;*«ffl 

ic-rs. ttB«a#a8 o 114, 77^^11 

|c#»B^4rS#^, 7-f Mf7h£ roj (CLT# 
S)^71-5 (7016). 
[0 04 6] Z.<Dy—9 TAAAJ ©^ld©^, 7ci:x 
(4, 3yifj.-?i0 2*f>7-? Uaaj (om^-iH 

usg*^Wci:ut), -attfafisso 114. 9^ 

77^ 3 0 l\zm%?i£th± : f-9 TAAAj ©(iB^riE 
L<«^-T5^ r*tl4, (4Bfa^8 0 

l* 5 , ^^8 l i©{tBBS4r-7cWlctaLT^5 



-7- 



0U102 Kfc&i-SiWtt*^. 
[0 0 4 7] *fc, t>Lx~* TAAAj 

£\ ^m'sm^ms o i 2te/f^ tk^lts 

[0 0 4 8] 7^^7^i30 lKMASttfc 

yst3o ncmAdnr^-^ taaaj ok 

tf, ±e^**Tl"fK«2 0 3*»fc?>f:/?y»B 
3 0 1^7-^ TAAAJ £>&tt& *1 
0 2#, TAAAJ <DK*fflL«rS#Lfcfc*-5 

(1 5 0 0 0) o ^y^^ 1 0 207r-f^f 

7 4fi,*—j» TAAAj iV*5^-*<&K*fflL*i5# 
"T5 (1 50 0 1) o 

[0 0 4 9] ftfflfasom, 1^81 lfrt> 
yyf^-J* Taaaj ld*HS-r5ff*a«L (15 
002) , yyf htf > has roj -efcsr£ftfli^L 

(15003), y-K^!)yhH^^yMi 

*fflL&3ift&ll*&-f£ (1 5004). &g^l#&8 
om, r^^7K^l2 0 1 tc-r— * Taaaj 
S:tMA?I«*ffi#«*Srtt*-r5 (1 50 0 5) . 
«***od>5i:, 8 1 2<£>LBA 

SrW5 3T»*K:*3EL (1 5 00 6) , 

8 9 0CSl7^ TAAAJ ^7^^7^1301 
6 (1 6 0 0 0) o 

[0 0 5 01 HI 6£, ««^S!#S8 9 0\C£Z>m$l 

tira«B¥»8 9ott, r A 
a aj <D#mztiit*?*4T&V7473o\zm$frt-z> 

±5fc5>f:/5!J$«3 0 1 Ktg^f* (16 00 
2) o VU x-* TAAAj ©IfriWSih/Cl^^ 

ft^deooi). mmmffi3 2\^ mfeztut**? 

x^Tftig&'f £ (1 6 0 0 3, 1 6004), Yy<< 
(1 6 00 5) , 7^^7ygl301IC»lT!)-K 

^vyK^tTtrf-^riWL (16007, i 
6008) , f^^rwii2oi 1:7^ v^^y 

KftlS^L (1 6 00 9) , T^ftSt&t? (16 0 

io, 1 6 0 1 1) o fli«wa»#a8 9oii, 

vyKKTt6i, ttfi9I*a8 0 1 KfflSW^T 
Lfcr £ft«£1"S (16012), 



[005 1] Htm 1 5 CMot, {fc«fa^S8 0 1 

«i^T<D«»*:a«-Bt&j:* x%£8 1 10^ 

XVSttfiftWCv* -f X^7Kgf20 1 OWN, 
LB A. dSitJ^— *fift**i&* (1 50 0 7) , t 

#iif^hft r i j tc-feix m (15008) , roffig 

ftSSft^Vf^ — 1 0 2<D77^f/^7^1 0\cm 
(1 50 0 9) o 
[0 0 5 2] a^tfa.— * 1 0 2^77^/V^fAl 
0H. (fettlta (WWN, ^f^fTS^ LB A, 
*g) ©««r*Srj5i:. ^^^7^^120 .lfc 
5-K3T^K*»frL (150 10) . x-^SrK* 
Bit (1 50 1 1, 1 5 0 1 2). ^Vt 0 ^— 102 
©7r^^7 1 Al Ofi. yj*?T W^g2 0 1 
^bK*BJL^TO«fttSrj*Si:, 
5 1 O^SWS^g 8 0 1 fcft*ttt L#3feT Lfc £ t ft 

(isoi3) 0 ttKffs#a8 o 1 k* 

IBL^T*>«»«rfitf*Si* £&3<8 1 KDyy^jv 
JiSte*KT^-fe^BI*i:«HH«P*:»ta* (150 
14) , U-K^!>yhS:7^^yM (1501 
5) , ^Hit^Tt^ (15016, 1501 7, 
15 0 18). 

[0 0 5 3] n<Dj§-g\ tctx.y—? f AAA] \C IM 

max-} <DM®mm&*Ktbtix^tzti,x^ rai 

D5"C8^^TV^^^7W 2 0 3C77^ 
TAAAj ftttfi-rS^BttttiKi* Lt>*V\ 4*4 
?>. 7^7 r 7 , ;§l3 0 1 tRAIDO^f'f^^ru 

^812 0 i<o2«0fi^-^#fiy«$ixrv^5fc«>-c 

[0 0 5 4] r^T\ fifflf^8 Olli, ISJgL/c 
7^^7W8i2 0 l±Or-^^-M7^t 
^**tTV^4^r.tSra»i-.5fc, &&3E8 1 lcoffcg 

tim<o#3&*V h& ro j icy-try h-r*. r^gs. ^ 
«Hft«fl|«iJ:»tt:i-5«««aa8 1 2©LBAHt*tt 

[0 0 5 5] jfcfc, W«til«<o^y^— > 3 yicov^ 
^*t*"C^ r®<ffife(cj <t rss««fcj 02 

[0 0 56] rsigtcj 0!>»J«I««d^X.5>ttfc»^lC 
2 p^RA I D 0 T y-2T \s4 ^fi 2 0 1 dr 2 0 
2(0 2tt3flC7 f -^«:lE»^5o RAID0tt««tt*s 
RA I D5(Clt-<T^^fc^). RA1D0^^^7 

u>< 2 -^icx-^ ^se^-r 5 r ^ xmm\t*m#~r 
mftm*ttto£tix\t*tzm&\a±s 1 p^ra i d or 

[0 0 5 7] r^Jicj <DfflW»«ds^*.e>iifc»&ic 

tt, DVD-RAM^f>f7/j:^ ^^^L^Hg/ir^fB 
g^f-f7l:r- ^ftBe^-T5o DVD-R AM^x^ 
Til, 3 0^(D»#^p[|g-e$>5i:V^n-C*3l?. /n— 



-8- 



©$J«i#m!HJ£ftT^ 2#<zm 

[ 0 0 5 8 J r^^M^J-j feftfca 

<W»U ftifl^asom, HfcS&«§3r©2fc 
nKSi2oi<osifP rjcstj -c, t*-*** 

7Wgl2 0 2©StfI*S r$&j -e, tV**T 

WSB20 3«tms0f^ r*sij ©a^., figwa 
8 o i », Kfi#3f©ttia§i#sr}ea u tjiokj 

>f$i«2 0 3©2«jJrfc7*-**E»r3. Jfi&*^© 

r t Srftgf 8 0 1 fcRjeLT* < . 
[0 0 5 9] 774'<'? J r*'l'5 0<Di&1gltmKM*:l 

WttZtzfrlZit, 77 / f /< ^'lr^5 0 4rATM (Asyn 
chronous Transfer Mode) 5 1 &<!fffi,<£>^ yYV — VK 

[00 60] r& jijic j ©fgJWlf s*s^.x. btifc^&ic 

tt, 'T-i^VT U^mm.Z 0 1 <fc 2 0 2<D2&ffi{C ; r- 
*«:E*U *r-<7>9T i^g 2 0 1 icfg^ L^ct 1 — 
^Ir^yt-a-^ioifffliU f^i'TKgi 
2 0 2KfE&Lfcx-^£=i>"f^-^ 1 0 2^fflii- 
•5. fifta¥S8 0 ltt, SS^fflLiSrII*Ufc3Vtr 

[00 6 1] rpj&Kj ©W«iflJ«^#ie>ixfc»-a-Jc 
14, 9 * Ui»nr«ft ? -f :/? !) £B 3 0 1 \zf 

-»*«:B*1-a. lOtf, WWtMBt^A— yiD 

(02) «rftaoLrttll*3I#a8 0 llc-S-xSri: 

•c, Gentses o i 

-**ttfco©*^Tte&*3£5fc7*-*S:E»-t 

1 C^w-T* I D£««?-*-38#«r«S:tta&g#;fcS. 
[00 6 2] Ztl*X>tdM.LXZtcmifoBmX< 

a«fC«!#Pfl!f«SraE3l-*-Srt'b-e*4. £1T"CH, S'J 
[0 0 6 3] HI 4 li, f^Xst bVffi&O&l&TF-tmT- 



5 WMt-Ctt, BWjSKft K 7 i lea 2 o 
<0i)-^7*^2ir3A5*>t). 7***3 (rtt$t>{t 2 

orot/7*>y4 £ 5#S$>3„ 7*— * TAAAJ 

^<D7***4\z.WbtbtiX\^b-*Z>b. 7—9 rA 

AAJ fi, r/l/3/4/AAAJ ifc5„ -©£5 

^-^•x.r*3t, *©7*/v^K*&ift£ft3:7r^^-ir 

, 7***2\at rasatj , 7*^3tc« 

ttjMtt*3JlS. 7***3\C7***4k5& 

4fc r*JBtj ©Mftflltt*£ai-*-3fc, 7*/wy4fc 
ffi^^Sf-? TAAAJ fcf±, ?7*fl'h<D$mtit 

I o o 6 4 ] *fc» 7**y3<Dmwffinzmj£-tz 
-fiur^-rsrt t^rtgr*fo5„ 7***4 

fr$>7*)\<*1\Zrf-9 Taaaj Sr^SlUfcttlc, 
-f-r-Jcx-^ taaaj ©«foTV>**J«H*aSriNIJ*-f 

(DMrnm-rz-r^xntSfett. -KB^a^s o 1 

[0 0 6 5] ilB^^OVX^AICj;*!^ ffiBW 

ic J: o T*W $ ^ttft^CttSBOif 1 ^ & a9«ett 
8B&S1RU 7*-#Srfi9*i-6Ci*»-C*«. ffiBW 

[0066] jwc, *&mot& 2 onmmm 

[0067] Blli, ^2 ©Hl6^li©fX'r AtS^H 

a-? loiil o 2lrdf^s/->^T r ^^i' 9 o irita 
g«#g:9 l«rK^TV5^i, =^-#10 1, 

102, 7 ? -r^^7Wif20 1, 2 0 2, 20 3, 



-9- 



ttTl^£-Cifc£o 3^^-? lOlilO 2KRI* 

77>f^ft^^yf5 0 l^e>100WWN§ri$ 
o/n- Ft* -f * * «■ i LT * ft* <fc 5 * ft 

[0068] nmmmtt® s 2 ko^trw-t 

^7WS|2 0 1, 2 0 2, 2 0 3tRttfc8i* 

**LTV^ 5>f^?5««3 0 1K:Krt 

[0 0 6 9] fcfc, ftgfS^&S 0 1 fcov*TRH+ 
5 0 IB 2 <0^J£^^(DfteWS^© 8 0 1ft fl 
■»U«iBI*R:H2(OSC»«8 1 l £fflv>8<> 
^8 1 2(1, 7^*^Tl^^®2 0 1, 2 0 2, 2 0 

=l—# lOltl 0 2fcRttfc*-y y^a^W 9 0 

[0 0 7 0] mz. «HR»*ii*«HIK:ov^TRW-t- 
8 C B9fcBl0tt, E*§£«fc^*£»fti;:»*& 

10 1O7r^>^fA10fi, -f — lM^h6 0Sr 
^LT, 7 TAAAJ £ I * 5 7*— # 

s«fa««woarsr3vtr*-^ i s i 

©8 o licn^-rs (9ooi) . #Kf«^g:8 0 1 

(9002) , ^ htrjx h£ r i j icurst 

BM&«rS*L (9 0 0 3) , 77-</^-ASrStatf 

(9004) o ::m ^^8 i ioisi©fif 

i/ a f>f^9 0H, X — ^ TAAAJ SrJMfrrS© 
fcfMW«*&«*tf«* 8 1 2 J: 0 fcSfrT 6 (900 
5) o ::m +£4£#«*#*>*ir<&i:* «0 

0 111 »*&-r4«*»a*8 1 2<0LBA**JD3"t 

(9006) , &mfitm*=»VzL-*i 

0 1 fcfMH"8 (9 0 0 7) 0 

[0 0 7 1] nyt'a-^ 1 0 1 (077^/^7^1 

lex— *£5t&*>fc#>£>7>< bYyK^^yt^- 
* 1 0 l<D*wi'a.7 s 4X? 9 OlcXfrfS (9 0 0 

8) 0 ro»ta*H:i^~3vtra-#rttffta*hs 

fcft, #*?«>rfl55i"efc5 (9 0 0 9) 0 yyjfr*/* 
7A10I1 ^rt ^'>a7^f^^ 9 Od^StMxT 
<£>$&e (9 0 10) SrStflRSt. filfl^8 0l 



lc#t»*#£TLfc£i:**#1-5 (9 0 11) 

«m^8 1 l(Dl#@(Dffigtf#{C x aj/fcT*-* 
lOlWtyv'^^^OOWWN, LBA, *5 

(9 0 12) % tfftlfyhfc 
r 1 J frfey (9013), 
[0 0 7 2] ttlfflfa8 0 1tt, aVfcf^- 

* 1 0 lJWOayi?a- ^±<0^riry^^^^9 
0i7^^7W8l2 0 1, 202, 203tf>fg® 

w>s#a8 2^b, **t?nww*«»L-(9oi 

4) , -S^*<&ev>E«£g£3«L (9 0 1 

5) , 7^^7N 

-fii2 0 1 l^lT"^ TAAAJ Sr»jW"f-SOlCiK 

taaaj 

twiod^t (9016) , »^-T8®*sta^ 

8 1 2©LBA«r«0 3T»*lcaeML (9017), 
flMR««¥«8 9 0K:^-^ TAAAj t0®»Sr®*-f 
8 (10000) o 

[0 0 7 3] HI 0IC, W#m$U#g8 9 0lC±£»S! 
^Sc0#Mg^^1- o 1ff#«S^S8 9 0Ht. f-e^x- 

* taaaj ^s^^ttrv^Sa^vfa— ^ 1 0 \<o 

^vi/^fA^9 9 OK#brmi9M¥S9 1 
Lry-K=i^VKSr»ffL (1 0 00 1) % f*— * 
TAAAJ Srtt^fflL (1 0 0 0 2, 1 0003) , 7 
^^7WS120 \ h^^Vimfl, (1 

0 0 0 4) . x— ^ Sr<$3&4> (10005, 1000 

6) 0 Mil!fi8 9 0H h^VK^Tt 

st. fits^s^asoi^asg^TLfcrtsra^ 

t5 (1 000 7) 0 

[0074] Wi9}:lot, ftff'ffS^Ss 0 1 

tx* mm&Tvn&z&imzt, %&ms i 10^ 

rr-ctt2«B^tettfflf«^ r>f^7i/ 

^^I2 0 1(DWWN, LB A. ¥<D 
(9019), G»\*y h^r Tij \z± 

yhi-* (9020) o *l/c. ^r-TA-BKlcrt £r 
*ff*i:*OBI», «®BB*£«£&tP (9 0 2 1). 

htry h& roj iat (9022) , t 

#5i**^Ti-S (9 0 2 3, 9 0 2 4, 9 0 2 5) 0 

[0075] m^. R^fflbteiicov^TRiHra. h 

V^T, "T— ^ fAAAj ^7hay^-^10 1 
to**?*/*?-**? 9 0 7>f^^7Wil2 0 

i±(c$>5„ ^h3yt^^io2^7*- 
^ Taaaj *:K»Hrt1fr& (liooo) , 

1 0 2 077^f^7A 1 0*5, -Y— 

^F6 0^Lt77'fM^ taaaj <d^>-?<d 
R*HiU*rtt«*a*»8 0 1«JUI*-fS (11 00 

1) o 6111^8 0 111 »8 1 



-10- 



/i^-A TAAAj K»lSi-5ffSra«L (1100 

2) , ^h^bMoTV^^igl (11 

003) . u-w^yh^y^y^yhn (i i 

004) . 

[0 0 7 6] fcfc* &M^M^SiS 0 1 n *f—9 ta 
AAJ O&tt^ftT^S** hay^-? 10 1^ 

*MWI¥«8 2ft»MMMK*SftL (1 1 0 0 6) . 

-*»^0«v*BraiI*Wtrs (l 1 o 0 7) o 
3^tf^— ^ l o io*ty*/af^? 9 

0(08i¥» o fctt$ 0 ftfitfSf ©8 0 1 n 
3W»JI*8 1 2^^y^a^ 1 0 2<D^fy^f 
^^9 0^f^ Taaaj Sr»itti-5oiCjK5/i: 
(1 1 0 0 8). 

8 1 2©LBA*«0ST#*fc3BEL (1100 

9) , 1MMUm8 9 0lCfttU 3^^101 
©^^>af^^^ 9 Ofr&aVtfa — * 1 0 2©=^ 
-ty^A^-f** 9 0Kf*— * TAAAJ £at*-^-£ 

±5i-B*-ra (12000) 0 

[0077] mi 2\c^ tft%mm^m8 9o\z£z>wt 

4aao^KBSr^ 0 l#fB«M^J£8 9 011 ■r-cfcf*- 

9 FA AAJ #*4f : $&i£;ft/TV^£ a ^fc*^ — ^ 1 0 1 CO 

9 ofcttbry-K^vKSrSM? 

1 (1 200 1) .f^^HiL (1 200 2, 1 

2 0 0 3) , 3yk° a -^10 2O^rt5'^f'f^^ 
9 OlC^ h^VKSr^tTb (1 2 0 0 4) , ¥—9 
Sr»tat? (1 2 00 5, 1 2 0 0 6) . 

8 9 0ft, KSr^T-TSt, fiK^a^JS 

80 1lC«B#^7tfc^iS:*§tS (12 00 
7) o 

[0 0 7 8] St/Hl 1 Clot, ffi«fS#B:8 0 1 
n «««»¥a8 9 0^e>*»«)^TSrSWK5t, 
^&gt8 1 l<0£^T^£fi\ rrT-ft3#@<7)ttBff 
SI-** h3 vtf^x — * l 0 2(0=*^;/^^*^ 9 
0CDWWN, LBA, :fcj;l^-*fi*^<0ffi«£g# 

a* dioii) , #3* try h& r i j ic-fey hi-* 
(11012) ._4fc, ffiBffa^as o in ^r-T 
^JBHk:«HHW«r»#5&tp (110 13). ^Uf 
TAAAJ ^)AotV^t^h3yfa-^10 2 

WtyVa^^ 9 OOfegff^^n yfa-^ 1 

0 2lC«fe-T5 (110 14). 

[0 0 7 9] 3yfa-^10 2W77^/^7 : Al 
0H (MffiffR (WWN, LB A, f*— **) <Dn^& 
SttSt^ s^tfa— ^ 1 0 2©^t^>af^^^ 9 
Oicy-K^^VK^fTt (110 15) , y*-9£ 
Kfotb-t (110 16). ayfa-^ 1 0 2(077^ 
;^7A10I1 ^7^7^^9 0^^ti] 

L^jcom^ (11017) *%imz>t. ^>tv- 



^151 ©tf««a*a8 o 1 i^mu^Ttt: 

t«r«ft1-S (110 18) . filflfgSOm, 
T*ir*0 (110 19) , 

h&^jM^bt (11020) , K*mt^T-t- 

5(11021, 1102 2, 1 1 0 2 3) . 

[0 0 8 0] fcfc, x-^ojEi&fteltco^TUiWr 

@i3tBi4n ?-9*±m%\^mrz>im 

i^ffliMl^v^T, x-* taaaj n 

yi^a— 9 lOlilO 2 <D* Wis ^7*4*9 9 0 

y*<{X9Ti'4giW:2 0 1±<D3mffi\z.hZ> o 
T\ ^h3yfa-^10 2^f^ PA AAJ £H 
i&f1-5^ (1 30 0 0) , ha^fcTa-* 1 0 2 
©7 7^WTAl0lt >T-t*yh6 0WhL 
T, 77-f/^-A TAAAJ <Dy ! —9 (D'gMir&MM 

m^®8 o i fc»*rs (i3ooi). ffigwa^a 

80 in 1 1^5)77^^-^ TAAAJ 

R**1-*fr*3WlL (1 30 0 2) . 7^ Mfyhtf 
roj fD-K^^Fii roj -efe5^i^«mf5 
(1 30 0 3, 1 3 0 04) o hfcTy h# Toj 

fcv>i§£\ t-fci*y-K#*vh# roj -cfcv^ste 

tt, X*rffe»i:*5 (1309 0, 1309 1, 130 

9 8, 1 3 0 9 9). 

[008 1] fcfcU #gt3®¥&8 0 m Hf* 

b£ ru fcurWff(DBBi»*sm-r6 (1300 

5) . tlf^ r A A AJ ©7 f -^g^#<4ot 

^fcMCIi, ®WffS&8 1 2«rJBvvCS*££«* 
^iftM^fcS^ (1 3 00 6, 1 3 00 7). 

r^-eii-r-^fi^^^-f 5 o fta^s^s o i 

n n>-fc^ — ^ 1 0 2©77^f/^7 A 1 OtC, ^ 
^8 1 ltf>3#B<E>ffigffi$& (nytV-^ 102© 
^r-Y yisx.'rj 7*9 9 0ft<Ox— ^ r A A AJ ^tv^?) 
ZmS-TZ (1 30 0 8). 
[0 0 8 2] ^^^^-^10 2077^^7^1 
on |i«««(0#fifSrS»t5t, itS**ifctt«^ 

0 2©^r-t y^^-y 5 ^^^ 9 OiC^fTi-5 (1300 
9) 0 1 0 2<D*\yis^7 f J*9 9 0 

n (130 10) . »*a*S5T* 

(1 3 0 1 1) 0 77^/^fAl on * 

ifa-^ i 5 l ©filf If 88 o l irtf 

ii*#£7 Lfc d £: £fg£rf 6 (13012), 

[0 0 8 3] fig^3®f^8 0 in ttii^T^ 
eSrSHtSt. 8H(»8 1 lOlSB (a^fc^-* 

10 1^t^^7^^^90^7-^ TAAAJ 
Sr^i") <t2#@ (7^^7W20 1te7^ 

r a A aj Sr^-f) HfrUhx-* r A 

AAj Sr1§^1-5J:9{c, flWR««*H;8 9 0lc!l*i- 



-ii- 



5 (1 4000) o 

[0 0 8 4] H14IC, ^nW&^S 9 0\C£%mi 

* TAAAJ jM^iiiftTV^aV'lfsL— * 1 0 2© 

L (1 400 1), x-*£K^ffiL (1 4 0 0 2, 1 
4003) , 3ytfa-^10 1O*t^>a^^ 

90i:f^^rKgi20 lKyJ h^>h**m 

ttRSftL (1 4 004) , (14 0 

05, 140 06, 1 4 005', 1 4 0 0 6 * ) 0 Iff 
««K#&8 9 0tt, I007^( ha^^KfrJfeTi-* 
£ s #S<ffS3M£ 8 0 1 fc«»#£T Lfz^t SrftSt- 
5 (1 4 0 0 7) 0 ^fig^«8lc:^tt^1ff^©^g8 
9 0tt. 7r^^f t^/^yf 50 1 ^a^L, 1 

tU fta«8«^®8 9 0^ lo^iS^giC 

8 9 o */bv>5 r t asss lv\ 

[0 0 8 5] BtfBl 3^Mot, figt3®^S8 0 1 

tt* if affiiS^S: 8 9 0 a>b#K£T<B«#«r£tt*a 
7r>r^«KlcSJ|*3e»fH«FSr»ta* (13 0 1 
4) . 7^h^ h£ TOj IdbT (13 0 15) < ft 
#3i**^T-t*S (13016, 13017, 1301 
8) o 

[0 0 8 6] ±IB»2 0Hl£©?g»Oi/^ < 7 i ixJCj:n 
rt-C, ^fcr a -*±fcft#&*LTWfc4v^ 

**2«j^K±te«#r*<ot\ hai/fcTa— 

#±1r6Z5itZktfb*>Xhf*-**9z5zktiitt 

v\ 

[0 0 8 7] ftlC. *3^0$3 0£tt$tB (^M^- 

[0088] Hitt, m3<Dmmm<Di/xT*mfm 
^7^ryh7 0^Siim o ^yt^^io 

^tLr«fiB*T5o **lfe^ffi-Ctt^7-rT^h7 0k 

[oo8 9] ^m^jiTtt^v^-^ ioiao 



0 2 ft*»«0W»$JSKBM«-5 £ k 1>XZ £ D t> U 
n^tfa— * 1 0 2^77^^ft^^ 0 1 

1 1>tm» 1 0 k m« Jdi*lS»&fctt* 0 1 fcB*W> 
J;3i^ x^^f^5 2^U ATM (Asynchrono 
us Transfer Mode) 5 1 &4M/C8SRi~5 0 t>^>5A>A 
TM5 lfc^rttoaBK^a^fflV^TtBBftv^. 

[0090] mmm\a^x®>wt%o ty-t 

T> Y 7 otetck ^tf#W^SaS4iffc»*SJxfc 

iN5^-**r^~1f*y h 6 1 Sr^UT=?^tTa-# i 
0 Ufcttl 0 2Ij:3I0oI*£ o x~*£<>ftf> 
tSfcflMFl- S*W«t 8 (^-^^icjg^raft* 

««r»l»r5rfc«r*a-r4. tot, <r-*<o%fti$ 

[0 0 9 1] fcfc, K£teoivtIMJH-*. *m&&m 

Ufc*4k**fcfc9 0«»3^hi:, ttB*ami®X 
&^8 1 lfc*|»lfcf^ft,.77^Ml, *3<fctf 

tt««f«*t>i:i^ tm i f--fi\cM\,xm*r*. ttk 

*.fi % x— * Taaaj tW4 WSSB2 0 1 1 
2 o 2ici o 0R|ft»*tt. W:f^^rwiI2 
o3ic»»^ixrio0iB«#stt, *fc?-f^7y* 

S3 0 liC^WSftTl 0 BKMJtlfci^, :<^3 

X-^ftfS^* h = { (RAID 0Oficj$3^ h 1 0) 
X10BX2+ (RAID50ft»3^H5) X10 
0+ (7^f^7!)©R»3^H) xiO0) xf-^ 

h#2«£;hrC^5<Z>H\ 2>7^^7KCf- 

ffie^a^iisoiB, -^r^r^ct^f-^^ 

iflf^8 0 1lt 7T-f^JBJBo.5*>, x-^o^ 

[0 0 9 2] £fc s S'J^J^LT, ^ TBBBJ 
2teDVD-RAM^7.^7l: 3 0 

r-^^^ 3 ^ h== { (RA I D0^)fi»3^ M0) 



-12- 



X30+ (y4 ~fy } J <Df&f$zix hi) X3O0X2} 

[0 0 9 3] hSr»Wr5lc^ 

^-^10 24r^LT7 i -^^^Ufc^ 
yfa-^ 101H0 2tt^&SWN£;frt-£0 
•C, iftO** h=i^t p ^— ^OWWN^M8 1 1 

[00 9 5] rtf>£?£^££, 

# y iy- £ >fxi-s<;vX ]) f-<D¥hb ^«5fe1"S 

IfoL-^ 1 0 l^fcttl 0 2^U3yta-i? 1 5 
[0 0 9 6] ±IBM3 0^jffi^fficD^^AI^ttl«, 

tzmmmnt, mmm^&s ttsws^s, *5.£tfl* 

5„ 
[0 0 9 7] 



[0 1 ] *wft<ommMn*^iri/*Tmjmxh 

[02] ^M^^ti-efe^o 
[03] *wa***i-B-c*>*. 
[04] v^ hysjs&s-rB-c**. 

[0 5] Sl^jSfi^ffilC^ltS^fairtai^aSr^ 
[0 6] *i©*JS»tt^»»t5K*Wb^ft*i-7 
[0 7] *i©iBigg«^»rt5tt«Wi#a<oE«t 
[08] *«8i3***r 7 n-^-Y— h 0-efo§ o 

[09] m2commmm\^^mmm^^^m^ 

[010] «S&S£^-Tn-^-h0-Cfo£ o 

[011] *2(Diafi»»K:j3r*sK*aL*Ba*r*-r 

[012] lMl^f7n^ft-MX^5o 
[013] *2 0**««lcferjS^ r ^O»R«!.aSr 

[014] ttSJtefflSr^-r^n-^ir-hH^fcSo 

[015] ii^»im>t7^y7ygiicts 
[0i6] mk&mz^-tyv-^^-vmxhz* 

10-7 7^/Wr^, 3 0— K9^f^ 3 1-1, 

3 5o-77^^ft*^ 51 -at 

5 6 0->f-t*yK 6 1- 

a, 8 2-»«*ai^a. 9o-*tr>af^^ 
9i-i»as«#a, 10 

1 5 1 -ayea- £\ 2 0 1-7 
^^^7KgI, 2 0 2-7^^^TWgl, 2 0 
3-7^^TK|i, 3 0 1-7^^7!)^ 5 

o i - 7r^^t^^^f> 8 o i-fi:ifl¥ 

a, 8 8 12-^fim 8 9 0- 



-13- 







N 


N 




70 


70 


70 


70 


70 




1 










D 








iftfl 
n 


7** 
OS 


tttt 


W 
1 






* 




w 




i 


ft 


1 






































2 








































m 







































31 
^31 



1 



[13] 



14] 



# 




» 


LBA 


0 


1 


2 


3 


4 


s 


e 


7 


8 


9 


10 


11 




n 


1 


201 




0 


0 


0 


0 


0 


0 


0 


0 


0 


0 


0 


0 




0 




202 




0 


0 


0 


0 


0 


0 


0 


0 


0 


0 


0 


0 




0 


3 


203 




0 


0 


0 


0 


0 


0 


0 


0 


0 


0 


0 


0 




0 




f 




« 




0 


1 


2 


3 


4 


5 


8 


7 


8 


« 


to 


11 




m 


4 


301 


0 


0 


0 


0 


0 


0 


0 


0 


0 


0 


0 


0 


0 




0 


5 


301 


1 


0 


0 


0 


0 


0 


0 


0 


0 


0 


0 


0 


0 




0 


6 


301 


2 


0 


0 


0 


0 


0 


0 


D 


0 


0 


0 


0 


0 




0 


7 


301 


3 


0 


0 


0 


0 


0 


0 


0 


a 


0 


0 


0 


0 




0 



41 



- 7»M 

A? 

2i 

1 — 7»JHf 

ft 

3 



812 



-14- 



[H5] 



5000 



5001 



L^-5002 
^5003 
r-5004 




4> 



lJ^5Q08 
p-5009 





i & 


— 1 i JV1V 
i 


ftff 


i sou 





5012 



S014-T l»*?W>*7aHftl 



-Or 



5016 



1 T--S017 



m i 



JIX. 



023 



r-5022 



JUL 



3^5018 

1>^5019 

vj^5020 
t — *roi 



5015 



-15- 



[116] 







i 
• 


1 wm> 1 


N 

6000 






no_^- — >6002 


N 

6099 


6098 


VYes p6003 
1 'J-HKOhS^'JXvt. 1 ; 



6008 



JIZ_ 



6011 




6009-T l«»ffi U£TmlBSI 



U-IW>l*T»Jjt2h. 
&7 



4 

1 p-6012 



6010 



^60: 



16 



r-6015 



L>-6013 
T-6014 



-16- 



[E7] 



1 n 



*<- 




I r-7000 








1 




I r-7001 


No_ 




_ = >^7002 


No 


p-7003 

1 




j-7004 

1 




^Yes 


r-7005 


1 ^rH^K*iic«9K I i 



* 



r^7O06 
1^7007 




7013 



: — ! 



7014 
7015 
7016 



-17- 



ims] 




8005^ 

8006-| Tiilbfey?^7tf^KBr<iic6sf rao \ 4 

7,97 U-f 8520310- K3?>K»i!~| 



800£-^ 



801! 



' I r-soiaafflb 



1^8008 

li^8010 
!^8011 



-18- 



[129] 



i i«ma»i 



9000 



* 9001 



AT 
9008 



assays 



1^9002 
]^-9003 
1^9004 
< ~-9005 

r-9006 



_9Q07 



■A f-^ppa* ij 
9009^ [gggiSE 



7? ; d g&*ta «gtt »?a» 1 



9011 



X^-9012 

lr-9oi 



9010 



ZU^-9013 

1 r-9014 



! 



en 



*tfc*5fa««s$raLBA$ 



I 



! : 



77<f JHtKi:3r«ttfiS B B#fc 



[-9025 



r-9024 



]^-9015 
p-9016 

r-9020 

I 

I 



-19- 



[bi o] 



10001-4/1 r 



m m 



□Mfa-9101©**?i/af-(7590C 



10004 



10007- 



10002 
10003 

10005 
10006 



-20- 



[Hi 1] 




11099 



1 



flap. 



T 



11002 



11003 



11004 
11006 
11007 
11008 



JM^-11009 



i I wwcWBim^ 



L?-uoii 



11012 





1 




! X/t?:i-*1O20 




t 

M— 


11014 





11015 



11018 



11023 



I 

I 

! 

I 



I ff.f 



11022 



11016^ltt^ )l^7iT'i 



11019 



1 p-11020 



11017 



11021 



-21- 



lH 1 2] 



12000- ^ 

12001- Tl 



— 



1 I 



□>tfi-5l01©*wS/af-<7590IJ: 



12004 



12007-^ 



□vea-9101«!) 



□VtTa-9102CD 



12002 
12003 

12005 
12006 



-22- 



mi 3] 



i m i ; , 



N 

13000 





i assays 






1 j5* 1 


i*l «»aiorAAAjcaiK-rsfT*a« 1 


13001 





XH/-5J 



yv : 13090 « 

13091 J I g g frlft I 
13099 



! 13098 



HP- 



13009 



13012 



13018 



13017 



13002 
13003 




13008 



13007 



13010-f ff: s^Tie I | 



*i=r 



ass* 



AT 
13011 



J1X. 



ft 



14000 

13014 
13015 
13016 



-23- 



[014] 



14000 



I 



mm 



3>lfa-9102fl«Wi'aT-fW90IC 



14001 



□>bTa-910ia»*«r>^7990fc 



14003 



i L 



14004 



4005^ ' f¥¥ 



. >T _ 



14007 



14006 



14006' 



-24- 



imi 5] 




15016 



-25- 



[ill 6] 




16009-i 



16012 



1l 



16003 
16004 

16007 
16008 

16010 
16011 



-26- 



(72)3RW# ffn R« <72)369i# iSE* ft^ 

»^JIimJHll«rtJ^Ei##1099»«fi ttc #^JIimjlWm^E5E##1099#Jfi «c 

(72)38W# 3S# 

F^-A(##) 5B082 EA01 EA07 EA09 



-27- 



i0G(g^hSt(3re^An Architecture for Global-Scale Persistent Storage* 

John Kubiatowicz, David Bindel, Yan Chen, Steven Czerwinski, 
Patrick Eaton, Dennis Geels, Ramakrishna Gummadi, Sean Rhea, 
Hakim Weatherspoon, Westley Weimer, Chris Weils, and Ben Zhao 

University of California, Berkeley 
http : / /oceanstore . cs . berkeley . edu 



ABSTRACT 

OceanStore is a utility infrastructure designed to span the globe 
and provide continuous access to persistent information. Since 
this infrastructure is comprised of untrusted servers, data is pro- 
tected through redundancy and cryptographic techniques. To im- 
prove performance, data is allowed to be cached anywhere, any- 
time. Additionally monitoring of usage patterns allows adapta- 
tion to regional outages and denial of service attach; monitoring 
also enhances performance through pro-active movement of data. 
A prototype implementation is currently under development. 

1 INTRODUCTION 

In the past decade we have seen astounding growth in the perfor- 
mance of computing devices. Even more significant has been the 
rapid pace of miniaturization and related reduction in power con- 
sumption of these devices. Based on these trends, many envision 
a world of ubiquitous computing devices that add intelligence and 
adaptability to ordinary objects such as cars, clothing, books, and 
houses. Before such a revolution can occur, however, computing 
devices must become so reliable and resilient that they are com- 
pletely transparent to the user [50]. 

In pursuing transparency, one question immediately comes to 
mind: where does persistent information reside? Persistent infor- 
mation is necessary for transparency, since it permits the behavior 
of devices to be independent of the devices themselves, allowing 
an embedded component to be rebooted or replaced without losing 
vital configuration information. Further, the loss or destruction of a 
device does not lead to lost data. Note that a uniform infrastructure 
for accessing and managing persistent information can also pro- 
vide for transparent synchronization among devices. Maintaining 
the consistency of these devices in the infrastructure allows users 
to safely access the same information from many different devices 

*This research is supported by NSF career award #ANI-9985250, DARPA 
grant #N6600 1-99-2-89 13, and DARPA grant #DABT63-96-C-0056. 

Patrick Eaton is supported by a National Defense Science and Engineering 
Graduate Fellowship (NDSEG); Dennis Geels is supporied by the Fannie 
and John Hertz Foundation; and Hakim Weatherspoon is supported by an 
Intel Masters Fellowship. 

Copyright © A.C.M. 2000 1-581 13-317-0/00/001 1 ...S5.00 

Permission to make digital or hard copies of part or all of this work for personal or 
classroom use is granted without fee provided that copies are not made or distributed 
for profit or commercial advantage and that copies bear this notice and the full cita- 
tion on the first page. Copyrights for components of this work owned by others than 
ACM must be honored. Abstracting with credit is permitted. To copy otherwise, to 
republish, to post on servers, or to redistribute to lists, requires prior specific permis- 
sion and/or a fee. Request permissions from Publications Dept. ACM Inc., fax +1 
(2 1 2) 869-048 1 , or permiss ions{«5acm.org. 

ASPLOS 2000 Cambridge. MA Nov. 12-1 5 , 2000 



simultaneously [38]. Today, such sharing often requires laborious, 
manual synchronization. 

Ubiquitous computing places several requirements on a persis- 
tent infrastructure. First, some form of (possibly intermittent) con- 
nectivity must be provided to computing devices, no matter how 
small. Fortunately, increasing levels of connectivity are being pro- 
vided to consumers through cable-modems, DSL, cell-phones and 
wireless data services. Second, information must be kept secure 
from theft and denial-of-service (DoS). Since we assume wide- 
scale connectivity, we need to take extra measures to make sure 
that information is protected from prying eyes and malicious hands. 
Third, information must be extremely durable. Therefore changes 
should be submitted to the infrastructure at the earliest possible mo- 
ment; sorting out the proper order for consistent commitment may 
come . later. Further, archiving of information should be automatic 
and reliable. 

Finally, information must be divorced from location. Central- 
ized servers are subject to crashes, DoS attacks, and unavailability 
due to regional network outages. Although bandwidth in the core 
of the Internet has been doubling at a incredible rate, latency has 
not been improving as quickly. Further, connectivity at the leaves 
of the network is intermittent, of high latency, and of low band- 
width. Thus, to achieve uniform and highly-available access to in- 
formation, servers must be geographically distributed and should 
exploit caching close to (or within) clients. As a result, we envi- 
sion a model in which information is free to migrate to wherever it 
is needed, somewhat in the style of COMA shared memory multi- 
processors [21]. 

As a rough estimate, we imagine providing service to roughly 
10 10 users, each with at least 10,000 files. OceanStore must there- 
fore support over 10 14 files. 



LI OceanStore: a True Data Utility 

We envision a cooperative utility model in which consumers pay 
a monthly fee in exchange for access to persistent storage. Such 
a utility should be highly-available from anywhere in the network, 
employ automatic replication for disaster recovery, use strong se- 
curity by default, and provide performance that is similar to that of 
existing LAN-based networked storage systems under many cir- 
cumstances. Services would be provided by a confederation of 
companies. Each user would pay their fee to one particular "util- 
ity provider", although they could consume storage and bandwidth 
resources from many different providers; providers would buy and 
sell capacity among themselves to make up the difference. Air- 
ports or small cafes could install servers on their premises to give 
customers better performance; in return they would get a small div- 
idend for their participation in the global utility. 



190 



Ideally, a user would entrust all of his or her data to OceanStore; 
in return, the utility's economies of scale would yield much better 
availability, performance, and reliability than would be available 
otherwise. Further, the geographic distribution of servers would 
support deep archival storage, i.e. storage that would survive ma- 
jor disasters and regional outages. In a time when desktop worksta- 
tions routinely ship with tens of gigabytes of spinning storage, the 
management of data is far more expensive than the storage media. 
OceanStore hopes to take advantage of this excess of storage space 
to make the management of data seamless and carefree. 

1.2 Two Unique Goals 

The OceanStore system has two design goals that differentiate it 
from similar systems: (1) the ability to be constructed from an un- 
trusted infrastructure and (2) support of nomadic data. 

Untrusted Infrastructure: OceanStore assumes that the infras- 
tructure is fundamentally untrusted. Servers may crash without 
warning or leak information to third parties. This lack of trust is in- 
herent in the utility model and is different from other cryptographic 
systems such as [35]. Only clients can be trusted with cleartext — all 
information that enters the infrastructure must be encrypted. How- 
ever, rather than assuming that servers are passive repositories of 
information (such as in CFS [5]), we allow servers to be able to 
participate in protocols for distributed consistency management. To 
this end, we must assume that most of the servers are working cor- 
rectly most of the time, and that there is one class of servers that we 
can trust to carry out protocols on our behalf (but not trust with the 
content of our data). This responsible party is financially responsi- 
ble for the integrity of our data. 

Nomadic Data: In a system as large as OceanStore, locality is 
of extreme importance. Thus, we have as a goal that data can be 
cached anywhere* anytime, as illustrated in Figure 1 . We call this 
policy promiscuous caching. Data that is allowed to flow freely is 
called nomadic data. Note that nomadic data is an extreme con- 
sequence of separating information from its physical location. Al- 
though promiscuous caching complicates data coherence and loca- 
tion, it provides great flexibility to optimize locality and to trade off 
consistency for availability. To exploit this flexibility, continuous 
introspective monitoring is used to discover tacit relationships be- 
tween objects. The resulting "meta-information" is used for local- 
ity management. Promiscuous caching is an important distinction 
between OceanStore and systems such as NFS [43] and AFS [23] 
in which cached data is confined to particular servers in particular 
regions of the network. Experimental systems such as XFS [3] al- 
low "cooperative caching" [12], but only in systems connected by 
a fast LAN. 

The rest of this paper is as follows: Section 2 gives a system-level 
overview of the OceanStore system. Section 3 shows sample ap- 
plications of the OceanStore. Section 4 gives more architectural 
detail, and Section 5 reports on the status of the current prototype. 
Section 6 examines related work. Concluding remarks are given in 
Section 7. 

2 SYSTEM OVERVIEW 

An OceanStore prototype is currently under development. This sec- 
tion provides a brief overview of the planned system. Details on the 
individual system components are left to Section 4. 

The fundamental unit in OceanStore is the persistent object. 
Each object is named by a globally unique identifier, or GUID. 




1 Wi'iSB 

Figure 1: The OceanStore system. The core of the system is 
composed of a multitude of highly connected "pools", among 
which data is allowed to "flow" freely. Clients connect to one or 
more pools, perhaps intermittently. 

Objects are replicated and stored on multiple servers. This replica- 
tion provides availability 1 in the presence of network partitions and 
durability against failure and attack. A given replica is independent 
of the server on which it resides at any one time; thus we refer to 
them as floating replicas. 

A replica for an object is located through one of two mecha- 
nisms. First, a fast, probabilistic algorithm attempts to find the 
object near the requesting machine. If the probabilistic algorithm 
fails, location is left to a slower, deterministic algorithm. 

Objects in the OceanStore are modified through updates. Up- 
dates contain information about what changes to make to an ob- 
ject and the assumed state of the object under which those changes 
were developed, much as in the Bayou system [13]. In principle, 
every update to an OceanStore object creates a new version 2 . Con- 
sistency based on versioning, while more expensive to implement 
than update-in-place consistency, provides for cleaner recovery in 
the face of system failures [49]. It also obviates the need for backup 
and supports "permanent" pointers to information. 

OceanStore objects exist in both active and archival forms. An 
active form of an object is the latest version of its data together 
with a handle for update. An archival form represents a permanent, 
read-only version of the object. Archival versions of objects are 
encoded with an erasure code and spread over hundreds or thou- 
sands of servers [18]; since data can be reconstructed from any suf- 
ficiently large subset of fragments, the result is that nothing short 
of a global disaster could ever destroy information. We call this 
highly redundant data encoding deep archival storage. 

An application writer views the OceanStore as a number of ses- 
sions. Each session is a sequence of read and write requests related 
to one another through the session guarantees, in the style of the 
Bayou system [13]. Session guarantees dictate the level of con- 
sistency seen by a session's reads and writes; they can range from 
supporting extremely loose consistency semantics to supporting the 
ACID semantics favored in databases. In support of legacy code, 
OceanStore also provides an array of familiar interfaces such as the 
Unix file system interface and a simple transactional interface. 

1 If application semantics allow it, this availability is provided at the expense 
of consistency. 

2 In fact, groups of updates are combined to create new versions, and we 
plan to provide interfaces for retiring old versions, as in the Elephant File 
System [44]. 



191 



Finally, given the flexibility afforded by the naming mechanism 
and to promote hands-off system maintenance, OceanStore exploits 
a number of dynamic optimizations to control the placement, num- 
ber, and migration of objects. We classify all of these optimizations 
under the heading oi introspection, an architectural paradigm that 
formalizes the automatic and dynamic optimization employed by 
"intelligent" systems. 

3 APPLICATIONS 

In this section we present applications that we are considering for 
OceanStore. While each of these applications can be constructed 
in isolation, OceanStore enables them to be developed more eas- 
ily and completely by providing a single infrastructure for their 
shared, difficult problems. These problems include consistency, se- 
curity, privacy, wide-scale data dissemination,' dynamic optimiza- 
tion, durable storage, and disconnected operation. OceanStore 
solves these problems once, allowing application developers to fo- 
cus on higher-level concerns. 

One obvious class of applications for OceanStore is that of 
groupware and personal information management tools, such as 
calendars, email, contact lists, and distributed design tools. These 
applications are challenging to implement because they must allow 
for concurrent updates from many people. Further, they require 
that users see an ever-progressing view of shared information, even 
when conflicts occur. OceanStore's flexible update mechanism 
solves many of these problems. It provides ways to merge infor- 
mation and detect conflicts, as well as the infrastructure to dissemi- 
nate information to all interested parties. Additionally, OceanStore 
provides ubiquitous access to data so that any device can access the 
information from anywhere. 

Email is a particularly interesting groupware target for 
OceanStore. Although email applications appear mundane on the 
surface, their implementations are difficult because the obvious so- 
lution of filtering all messages through a single email server does 
not scale well, and distributed solutions have complicated internal 
consistency issues. For example, an email inbox may be simulta- 
neously written by numerous different users while being read by 
a single user. Further, some operations, such as message move 
operations, must occur atomically even , in the face of concurrent 
access from several clients to avoid data loss. In addition, email 
requires privacy and security by its very nature. OceanStore al- 
leviates the need for clients to implement their own locking and 
security mechanisms, while enabling powerful features such as no- 
madic email collections and disconnected operation. Introspection 
permits a user's email to migrate closer to his client, reducing the 
round trip time to fetch messages from a remote server. OceanStore 
enables disconnected operation through its optimistic concurrency 
model — users can operate on locally cached email even when dis- 
connected from the network; modifications are automatically dis- 
seminated upon reconnection. 

In addition to groupware applications, OceanStore can be used to 
create very large digital libraries and repositories for scientific data. 
Both of these applications require massive quantities of storage, 
which in turn require complicated management. OceanStore pro- 
vides a common mechanism for storing and managing these large 
data collections. It replicates data for durability and availability. Its 
deep archival storage mechanisms permit information to survive in 
the face of global disaster. Further, OceanStore benefits these ap- 
plications by providing for seamless migration of data to where it 
is needed. For example, OceanStore can quickly disseminate vast 
streams of data from physics laboratories to the researchers around 
the world who analyze such data. 



Finally, OceanStore provides an ideal platform for new stream- 
ing applications, such as sensor data aggregation and dissemina- 
tion. Many have speculated about the utility of data that will 
emanate from the plethora of small MEMS sensors in the future; 
OceanStore provides a uniform infrastructure for transporting, fil- 
tering, and aggregating the huge volumes of data that will result 

4 SYSTEM ARCHITECTURE 

In this section, we will describe underlying technologies that sup- 
port the OceanStore system. We start with basic issues, such as 
naming and access control We proceed with a description of the 
data location mechanism, which must locate objects anywhere in 
the world. Next, we discuss the OceanStore update model and the 
issues involved with consistency management in an untrusted in- 
frastructure. After a brief word on the architecture for archival stor- 
age, we discuss the OceanStore API as presented to clients. Finally, 
we provide a description of the role of introspection in OceanStore. 

4.1 Naming 

At the lowest level, OceanStore objects are identified by a globally 
unique identifier (QUID), which can be thought of as a pseudo- 
random, fixed-length bit string. Users of the system, however, will 
clearly want a more accessible naming facility. To provide a facility 
that is both decentralized and resistant to attempts by adversaries to 
"hijack" names that belong to other users, we have adapted the idea 
of self-certifying path names due to Mazieres [35]. 

An object GUID is the secure hash 3 of the owner's key and some 
human-readable name. This scheme allows servers to verify an 
object's owner efficiently, which facilitates access checks and re- 
source accounting 4 . 

Certain OceanStore objects act as directories, mapping human- 
readable names to GUIDs. To allow arbitrary directory hierarchies 
to be built, we allow directories to contain pointers to other direc- 
tories. A user of the OceanStore can choose several directories as 
"roots" and secure those directories through external methods, such 
as a public key authority. Note, however, that such root directories 
are only roots with respect to the clients that use them; the system 
as a whole has no one root. This scheme does not solve the prob- 
lem of generating a secure GUID mapping, but rather reduces it to 
a problem of secure key lookup. We address this problem using the 
locally linked name spaces from the SDSI framework [1, 42]. 

Note that GUIDs identify a number of other OceanStore entities 
such as servers and archival fragments. The GUID for a server is a 
secure hash of its public key; the GUID for an archival fragment is 
a secure hash over the data it holds. As described in Section 4.3, en- 
tities in the OceanStore may be addressed directly by their GUID. 

4.2 Access control 

OceanStore supports two primitive types of access control, namely 
reader restriction and writer restriction. More complicated ac- 
cess control policies, such as working groups, are constructed from 
these two. 

Restricting readers: To prevent unauthorized reads, we encrypt 
all data in the system that is not completely public and distribute 
the encryption key to those users with read pennission. To revoke 
read permission, the owner must request that replicas be deleted or 
re-encrypted with the new key. A recently-revoked reader is able 

3 Our prototype system uses SHA-1 [37] for its secure hash. 
4 Note that each user might have more than one public key. They might also 
choose different public keys for private objects, public objects, and objects 
shared with various groups. 



192 



to read old data from cached copies or from misbehaving servers 
that fail to delete or re-key; however, this problem is not unique 
to OceanStore. Even in a conventional system, there is no way to 
force a reader to forget what has been read. 

Restricting writers: To prevent unauthorized writes, we require 
that all writes be signed so that well-behaved servers and clients 
can verify them against an access control list (ACL). The owner 
of an object can securely choose the ACL x for an object foo by 
providing a signed certificate that translates to "Owner says use 
ACL x for object foo". The specified ACL may be another object 
or a value indicating a common default. An ACL entry extending 
privileges must describe the privilege granted and the signing key,, 
but not the explicit identity, of the privileged users. We make such 
entries publicly readable so that servers can check whether a write 
is allowed. We. plan to adopt ideas from systems such as Taos and 
PolicyMaker to allow users to express and reason formally about a 
wide range of possible policies [52, 6]. 

Note the asymmetry that has been introduced by encrypted data: 
reads are restricted at clients via key distribution, while writes are 
restricted at servers by ignoring unauthorized updates. 

4.3 Data Location and Routing 

Entities in the OceanStore are free to reside on any of the 
OceanStore servers. This freedom provides maximum flexibility 
in selecting policies for replication, availability, caching, and mi- 
gration. Unfortunately, it also complicates the process of locating 
and interacting with these entities. Rather than restricting the place- 
ment of data to aid in the location process, OceanStore tackles the 
problem of data location head-on. The paradigm is that of query 
routing, in which the network takes an active role in routing mes- 
sages to objects. 

4.3.1 Distributed Routing in OceanStore 

Every addressable entity in the OceanStore (e.g. floating replica, 
archival fragment, or client) is identified by one or more GUIDs. 
Entities that are functionally equivalent, such as different replicas 
for the same object, are identified by the same GUID. Clients in- 
teract with these entities with a series of protocol messages, as de- 
scribed in subsequent sections. To support location-independent 
addressing, OceanStore messages are labeled with a destination 
GUID, a random number, and a small predicate. The destination 
IP address does not appear in these messages. The role of the 
OceanStore routing layer is to route messages directly to the closest 
node that matches the predicate and has the desired GUID. 
. In order perform this routing process, the OceanStore network- 
ing layer consults a distributed, fault-tolerant data structure that ex- 
plicitly tracks the location of all objects. Routing is thus a two 
phase process. Messages begin by routing from node to node along 
the distributed data structure until a destination is discovered. At 
that point, they route directly to the destination. It is important to 
note that the OceanStore routing layer does not supplant IP routing, 
but rather provides additional functionality on top of IP. 

There are many advantages to combining data location and rout- 
ing in this way. First and foremost, the task of routing a particular 
message is handled by the aggregate resources of many different 
nodes. By exploiting multiple routing paths to the destination, this 
serves to limit the power of compromised nodes to deny service 
to a client. Second, messages route directly to their destination, 
avoiding the multiple round-trips that a separate data location and 
routing process would incur. Finally, the underlying infrastructure 
has more up-to-date information about the current location of en- 




Figure 2: The probabilistic query process. The replica at ni is 
looking for object X, whose GUID hashes to bits 0, 1, and 3. (1) 
The local Bloom filter for m (rounded box) shows that it does 
not have the object, but (2) its neighbor filter (unrounded box) 
for ri2 indicates that 712 might be an intermediate node en route 
to the object. The query moves to T12, (3) whose Bloom filter 
indicates that it does not have the document locally, (4a) that 
its neighbor n* doesn't have it either, but (4b) that its neighbor 
7i 3 might The query is forwarded to n 3 , (5) which verifies that 
it has the object 



tities than the clients. Consequently, the combination of location 
and routing permits communication with "the closest** entity, rather 
than an entity that the client might have heard of in the past. If repli- 
cas move around, only the network, not the users of the data, needs 
to know. 

The mechanism for routing is a two-tiered approach featuring a 
fast, probabilistic algorithm backed up by a slower, reliable hier- 
archical method. The justification for this two-level hierarchy is 
that entities that are accessed frequently are likely to reside close to 
where they are being used; mechanisms to ensure this locality are 
described in Section 4.7. Thus, the probabilistic algorithm routes 
to entities rapidly if they are in the local vicinity. If this attempt 
fails, a large-scale hierarchical data structure in the style of Plaxton 
et. al. [40] locates entities that cannot be found locally. We will 
describe these two techniques in the following sections. 

4.3.2 Attenuated Bloom Filters 

The probabilistic algorithm is fully distributed and uses a con- 
stant amount of storage per server. It is based on the idea of hill- 
climbing; if a query cannot be satisfied by a server, local infor- 
mation is used to route the query to a likely neighbor. A modified 
version of a Bloom filter [7] — called an attenuated Bloom filter— is 
used to implement this potential function. 

An attenuated Bloom filter of depth D can be viewed as an ar- 
ray of D normal Bloom filters. In the context of our algorithm, the 
first Bloom filter is a record of the objects contained locally on the 
current node. The zth Bloom filter is the union of all of the Bloom 
filters for all of the nodes a distance i through any path from the 
current node. An attenuated Bloom filter is stored for each directed 
edge in the network. A query is routed along the edge whose filter 
indicates the presence of the object at the smallest distance. This 
process is illustrated in Figure 2. Our current metric of distance 
is hop-count, but in the future we hope to include a more precise 
measure corresponding roughly to latency. Also, "reliability fac- 
tors" can be applied locally to increase the distance to nodes that 
have abused the protocol in the past, automatically routing around 
certain classes of attacks. 



193 




Figure 3: A portion of the global mesh, rooted at node 4598. 
Paths from any node to the root of any tree can be traversed 
by resolving the root's ID one digit at a time; the bold arrow 
shows a route from node 0325 to node 4598. Data location uses 
this structure. Note that most object searches do not travel all 
the way to the root (see text). 



433 The Global Algorithm: Wide-scale Distributed 
Data Location 

The global algorithm for the OceanStore is a variation on Plax- 
ton et al.'s randomized hierarchical distributed data structure [40], 
which embeds multiple random trees in the network. Although 
OceanStore uses a highly-redundant version of this data structure, 
it is instructive to understand the basic Plaxton scheme. In that 
scheme, every server in the system is assigned a random (and 
unique) node-ID. These node-IDs are then used to construct a mesh 
of neighbor links, as shown in Figure 3. In this figure, each link is 
labeled with a level number that denotes the stage of routing that 
uses this link. In the example, the links are constructed by taking 
each node-ID and dividing it into chunks of four bits. The N" 1 level 
neighbor-links for some Node X point at the 16 closest neighbors* 
whose node-IDs match the lowest N-l nibbles of Node X*s ID and 
who have different combinations of the N** nibble; one of these 
links is always a loopback link. If a link cannot be constructed be- 
cause no such node meets the proper constraints, then the scheme 
chooses the node that matches the constraints as closely as possible. 
This process is repeated for all nodes and levels within a node. 

The key observation to make from Figure 3 is that the links 
fttnu a series of random embedded trees, with each node as the 
root of one of these trees. As a result, the neighbor links can be 
used to route from anywhere to a given node, simply by resolving 
th* node's address one link at a time — first a level-one link, then 
a level-two link, etc. To use this structure for data location, we 
map each object to a single node whose node-ID matches the ob- 
ject's GUID in the most bits (starting from the least significant); 
call this node the object's root. If information about the GUID 
(sauch as its location) were stored at its root, then anyone could 
firad this information simply by following neighbor links until they 
reached the root node for the GUID. As described, this scheme has 
nice load distribution properties, since GUIDs become randomly 
mapped throughout the infrastructure. 



5 "Closest" means with respect to the underlying IP routing infrastructure. 
Roaighly speaking, the measurement metric is the time to route via IP. 



This random distribution would appear to reduce locality; how- 
ever, the Plaxton scheme achieves locality as follows: when a 
replica is placed somewhere in the system, its location is "pub- 
lished" to the routing infrastructure. The publishing process works 
its way to the object's root and deposits a pointer at every hop along 
the way. This process requires 0(log n) hops, where n is the num- 
ber of servers in the world. When someone searches for informa- 
tion, they climb the tree until they run into a pointer, after which 
they route directly to the object. In [40], the authors show that the 
average distance traveled is proportional to the distance between 
the source of the query and the closest replica that satisfies this 
query. 

Achieving Fault Tolerance: The basic scheme described above is 
sensitive to a number of different failures. First, each object has a 
single root, which becomes a single point of failure, the potential 
subject of denial of service attacks, and an availability problem. 
OceanStore addresses this weakness in a simple way: it hashes 
each GUID with a small number of different salt values. The re- 
sult maps to several different root nodes, thus gaining redundancy 
and simultaneously making it difficult to target a single node with 
a denial of service attack against a range of GUIDs. 

A second problem with the above scheme is sensitivity to cor- 
ruption in the links and pointers. An important observation, how- 
ever, is that the above structure has sufficient redundancy to tol- 
erate small amounts of corruption. Bad links can be immediately 
detected, and routing can be continued by jumping to a random 
neighbor node 6 . To increase this redundancy, the OceanStore loca- 
tion structure supplements the basic links of the above scheme with 
additional neighbor links. Further, the infrastructure continually 
monitors and repairs neighbor links (a form of introspection — see 
Section 4.7), and servers slowly repeat the publishing process to 
repair pointers. 

The Advantages of Distributed Information: The advantages of 
a Plaxton-like data structure in the OceanStore are many. First, it 
is a highly redundant and fault-tolerant structure that spreads data 
location load evenly while finding local objects quickly. The com- 
bination of the probabilistic and global algorithms should comfort- 
ably scale to millions of servers. Second, the aggregate informa- 
tion contained in this data structure is sufficient to recognize which 
servers are down and to identify data that must be reconstructed 
when a server is permanently removed. This feature is impor- 
tant for maintaining a minimum level of redundancy for the deep 
archival storage. Finally, the Plaxton links form a natural substrate 
on which to perform network functions such as admission control 
and multicast. 

Achieving Maintenance-Free Operation: While existing work 
on Plaxton-like data structures did not include algorithms for on- 
line creation and maintenance of the global mesh, we have pro- 
duced recursive node insertion and removal algorithms. These 
make use of the redundant neighbor links mentioned above. Fur- 
ther, we have generalized our publication algorithm to support 
replicated roots, which remove single-points of failure in data lo- 
cation. Finally, we have optimized failure modes by using soft- 
state beacons to detect faults more quickly, time-to-live fields to 
react better to routing updates, and a second-chance algorithm to 
minimize the cost of recovering lost nodes. This information is 
coupled with continuous repair mechanisms that recognize when 



6 Each tree spans every node, hence any node should be able to reach the 
root. 



194 



servers have been down for a long time and need to have their data 
reconstructed 7 . The practical implication of this work is that the 
OceanStore infrastructure as a whole automatically adapts to the 
presence or absence of particular servers without human interven- 
tion, greatly reducing the cost of management 

4.4 Update Model 

Several of the applications described in Section 3 exhibit a high 
degree of write sharing. To allow for concurrent updates while 
avoiding many of the problems inherent with wide-area locking, 
OceanStore employs an update model based on conflict resolu- 
tion. Conflict resolution was introduced in the Bayou system [13] 
and supports a range of consistency semantics — up to and includ- 
ing ACID semantics. Additionally, conflict resolution reduces the 
number of aborts normally seen in detection-based schemes such 
as optimistic concurrency control [29]. " l : 

Although flexible, conflict resolution requires the ability to per- 
form server-side computations on data. In an untrusted infrastruc- 
ture, replicas have access only to ciphertext, and no one server is 
trusted to perform commits. Both of these issues complicate the up- 
date architecture. However, the current OceanStore design is able 
to handle many types of conflict resolution directly on encrypted 
data. The following paragraphs describe the issues involved and 
our progress towards solving them. 

4. 4. 1 Update Format and Semantics 

Changes to data objects within OceanStore are made by client- 
generated updates, which are lists of predicates associated with ac- 
tions. The semantics of an update are as follows: to apply an update 
against a data object, a replica evaluates each of the update's predi- 
cates in order. If any of the predicates evaluates to true, the actions 
associated with the earliest true predicate are atomically applied to 
the data object, and the update is said to commit. Otherwise, no 
changes are applied, and the update is said to abort. The update 
itself is logged regardless of whether it commits or aborts. 

Note that OceanStore update semantics are similar to those of 
the Bayou system, except that we have eliminated the merge pro- 
cedure used there, since arbitrary computations and manipulations 
on ciphertext are still intractable. Nevertheless, we preserve the 
key functionality of their model, which they found to be expressive 
enough for a number of sample applications including a group cal- 
endar, a shared bibliographic database, and a mail application [14]. 
Furthermore., the model can be applied to other useful applications. 
For instance, Coda [26] provided specific merge procedures for 
conflicting updates of directories; this type of conflict resolution is 
easily supported under our model. Slight extensions to the model 
can support Lotus Notes-style conflict resolution, where unresolv- 
able conflicts result in a branch in the object's version stream [25]. 
Finally, the model can be used to provide ACID semantics: the 
first predicate is made to check the read set of a transaction, the 
corresponding action applies the write set, and there are no other 
predicate-action pairs. 

4. 4. 2 Extending the Model to Work over Ciphertext 

OceanStore replicas are not trusted with unencrypted information. 
This complicates updates by restricting the set of predicates that 
replicas can compute and the set of actions they are able to apply. 
However, the following predicates are currently possible: compare- 
version, compare-size, compare-block, and search. The first two 
predicates are trivial since they are over the unencrypted meta-data 



Block 41 



Block 42 



Block 43 



insert 



Block 41 



Block 43 



Block 41.5 



Block 42 



Figure 4: Block insertion on ciphertext The client wishes to 
insert block 41.5, so she appends it and block 42 to the object, 
then replaces the old block 42 with a block pointing to the two 
appended blocks. The server learns nothing about the contents 
of any of the blocks. 



of the object. The compare-block operation is easy if the encryption 
technology is a position-dependent block cipher: the client simply 
computes a hash of the encrypted block and submits it along with 
the block number for comparison. Perhaps the most impressive 
of these predicates is search, which can be performed directly on 
ciphertext [47]; this operation reveals only that a search was per- 
formed along with the boolean result. The cleartext of the search 
string is not revealed, nor can the server initiate new searches on its 
own. 

In addition to these predicates, the following operations can be 
applied to ciphertext: replace-block, insert-block, delete-block, and 
append. Again assuming a position-dependent block cipher, the 
replace-block and append operations are simple for the same rea- 
sons as compare-block. 

The last two operations, insert-block and delete-block, can be 
performed by grouping blocks of the object into two sets, index 
blocks and data blocks, where index blocks contain pointers to 
other blocks elsewhere in the object. To insert, one replaces the 
block at the insertion point with a new block that points to the old 
block and the inserted block, both of which are appended to the ob- 
ject. This scheme is illustrated in Figure 4. To delete, one replaces 
the block in question with an empty pointer block. Note that this 
scheme leaks a small amount of information and thus might be sus- 
ceptible to compromise by a traffic-analysis attack; users uncom- 
fortable with this leakage can simply append encrypted log records 
to an object and rely on powerful clients to occasionally generate 
and re-encrypt the object in whole from the logs. 

The schemes presented in this section clearly impact the format 
of objects. However, these schemes are the subject of ongoing re- 
search; more flexible techniques will doubtless follow. 

4.43 Serializing Updates in an Untrusted 
Infrastructure 

The process of conflict resolution starts with a series of updates, 
chooses a total order among them, then applies them atomically 
in that order. The easiest way to compute this order is to require 
that all updates pass through a master replica. Unfortunately, trust- 
ing any one replica to perform this task is incompatible with the 
untrusted infrastructure assumption on which OceanStore is built. 
Thus, we replace this master replica with a primary^ tier of replicas. 
These replicas cooperate with one another in a Byzantine agree- 
ment protocol [30] to choose the final commit order for updates 8 . 
A secondary tier of replicas communicates among themselves and 
the primary tier via an enhanced epidemic algorithm, as in Bayou. 

The decision to use two classes of floating replicas is motivated 
by several considerations. First, all known protocols that are toler- 



7 Note that the read-only nature of most of the information in the OceanStore 
makes this reconstruction particularly easy; see Section 4.5. 



8 A Byzantine agreement protocol is one in which we assume that no more 
lhan m of the total n = 3m + 1 replicas are faulty. 



195 



ID 




1 



0 i / W " m... 



....-••-"/ A \ 



HI 111 W ^6L^ 




(b) 

Figure 5: The path of an update, (a) After generating an update, a client sends it directly to the object's primary tier, as weJl as 
to several other random replicas for that object, (b) While the primary tier performs a Byzantine agreement protocol to commit 
the update, the secondary replicas propagate the update among themselves epidemically, (c) Once the primary tier has finished its 
agreement protocol, the result of the update is multicast down the dissemination tree to all of the secondary replicas. 



ant to arbitrary replica failures are too communication-intensive to 
be used by more than a handful of replicas. The primary tier thus 
consists of a small number of replicas located in high-bandwidth, 
high-connectivity regions of the network 9 . To allow for later, off- 
line verification by a party who did not participate in the protocol, 
we are exploring the use of proactive signature techniques [4] to 
certify the result of the serialization process. We hope to extend the 
protocol in [10] to use such techniques. 

Some applications may gain performance or availability by re- 
quiring a lesser degree of consistency than ACID semantics. These 
applications motivate the secondary tier of replicas in OceanStore. 
Secondary replicas do not participate in the serialization protocol, 
may contain incomplete copies of an object's data, and can be more 
numerous than primary replicas. They are organized into one or 
more application-level multicast trees, called dissemination trees, 
that serve as conduits of information between the primary tier and 
secondary tier. Among other things, the dissemination trees push 
a stream of committed updates to the secondary replicas, and they 
serve as communication paths along which secondary replicas pull 
missing information from parents and primary replicas. This ar- 
chitecture permits dissemination trees to transform updates into in- 
validations as they progress downward; such a transformation is 
exploited at the leaves of the network where bandwidth is limited. 

Secondary replicas contain both tentative 10 and committed data. 
They employ an epidemic-style communication pattern to quickly 
spread tentative commits among themselves and to pick a tentative 
serialization order. To increase the chances that this tentative or- 
der will match the final ordering chosen by the primary replicas, 
clients optimistically timestamp their updates. Secondary repli- 
cas order tentative updates in timestamp order, and the primary tier 
uses these same timestamps to guide its ordering decisions. Since 
the serialization decisions of the secondary tier are tentative, they 
may be safely decided by untrusted replicas; applications requiring 
stronger consistency guarantees must simply wait for their updates 
to reach the primary tier. 

4.4,4 A Direct Path to Clients and Archival Storage 

The full path of an update is shown in Figure 5. Note that this 
path is optimized for low latency and high throughput. Under ideal 

9 The choice of which replicas to include in the primary tier is left to the 
client's responsible party, which must ensure that its chosen group satisfies 
the Byzantine assumption mentioned above. 

1 "Tentative data is data that me primary replicas have not yet committed. 



circumstances, updates flow directly from the client to the primary 
tier of servers, where they are serialized and then multicast to the 
secondary servers. All of the messages shown here are addressed 
through GUIDs, as described in Section 4.3. Consequently, the 
update protocol operates entirely without reference to the physical 
location of replicas. 

One important aspect of OceanStore that differs from existing 
systems is the fact that the archival mechanisms are tightly coupled 
with update activity. After choosing a final order for updates, the 
inner tier of servers signs the result and sends it through the dis- 
semination tree. At the same time, these servers generate encoded, 
archival fragments and distribute them widely. Consequently, up- 
dates are made extremely durable as a direct side-effect of the com- 
mitment process. Section 4.5 discusses archival storage in more 
detail. 

4. 4. 5 Efficiency of the Consistency Protocol 

There are two main points of interest when considering the effi- 
ciency of the consistency protocol: the amount of network band- 
width the protocol demands, and the latency between when an up- 
date is created and when the client receives notification that it has 
committed or aborted. Assuming that a Byzantine agreement pro- 
tocol like that in [10] is used, the total cost an update in bytes sent 
across the network, 6, is given by the equation: 

6 = cin 2 + (u -f C2)n H- C3 

where u is the size of the update, n is the number of replicas in 
the primary tier, and ci, C2, and C3 are the sizes of small protocol 
messages. While this equation appears to be dominated by the n 2 
term, the constant c\ is quite small, on the order of 100 bytes. Thus 
for sufficiently small n and large updates, the equation is dominated 
by the n term. Since there are n replicas, the minimum amount of 
bytes that must be transfered to keep all replicas up to date is tin. 

Figure 6 shows the cost of an update, normalized to this mini- 
mum amount, as a function of update size. Note that for m = 4 and 
71 = 13, the normalized cost approaches 1 for update sizes around 
100k bytes, but it approaches 2 at update sizes of only around 4k 
bytes. 11 Thus for updates of 4k bytes or more, our system uses less 
than double the minimum amount of network bandwidth necessary 
to keep all the replicas in the primary tier up to date. 



1 1 Recall that m is the number of faulty replicas tolerated by the Byzantine 
agreement protocol. 



196 



— 7 — ~ T 

t 


m=2,n=7 — ■ 




m=3, n-10 — — 




m=4, n=13 — 


4 




\\\ 




\k\ 

\V* 

\ 





0.1 1 10 100 1000 10000 

Update Size (k) 

Figure 6: The cost of an update in bytes sent across the net- 
work, normalized to the minimum cost needed to send the up- 
date to each of the replicas. 

Unfortunately, latency estimates for the consistency protocol are 
more difficult to come by without a functioning prototype. For this 
reason, let us suffice it to say that there are six phases of messages 
in the protocol we have described. Assuming latency of messages 
over the wide area dominates computation time and that each mes- 
sage takes 100ms, we have an approximate latency per update of 
less than a second. We believe this latency is reasonable, but we 
will need to complete our prototype system before we can verify 
the accuracy of this rough estimate. 

4.5 Deep Archival Storage 

The archival mechanism of OceanStore employs erasure codes* 
such as interleaved Read-Solomon codes [39] and Tornado 
codes [32]. Erasure coding is a process that treats input data as 
a series of fragments (say n) and transforms these fragments into 
a greater number of fragments (say 2n or An). As mentioned in 
Section 4.4, the fragments are generated in parallel by the inner tier 
of servers during the commit process. The essential property of the 
resulting code is that any n of the coded fragments are sufficient to 
construct the original data 12 . 

Assuming that we spread coded fragments widely, it is very un- 
likely that enough servers will be down to prevent the recovery of 
data. We call this argument deep archival storage. A simple exam- 
ple will help illustrate this assertion. Assuming uncorrelated faults 
among machines, one can calculate the reliability at a given instant 
of time according to the following formula: 

P _^ (T)(7-T) 

where P is the probability that a document is available, n is the 
number of machines, m is the number of currently unavailable 
machines, / is the number of fragments per document, and ry is 
the maximum number of unavailable fragments that still allows the 
document to be retrieved. For instance, with a million machines, 
ten percent of which are currently down, simple replication with- 
out erasure codes provides only two nines (0.99) of reliability. A 
1/2-rate erasure coding of a document into 16 fragments gives the 
document over five nines of reliability (0.999994), yet consumes 
the same amount of storage. With 32 fragments, the reliability 
increases by another factor of 4000, supporting the assertion that 

12 Tornado codes, which are faster to encode and decode, require slightly 
more than n fragments to reconstruct the information. 



fragmentation increases reliability. This is a consequence of the 
law of large numbers. 

To preserve the erasure nature of the fragments (meaning that 
a fragment is either retrieved correctly and completely, or not at 
all), we use a hierarchical hashing method to verify each fragment. 
We generate a hash over each fragment, and recursively hash over 
the concatenation of pairs of hashes to form a binary tree. Each 
fragment is stored along with the hashes neighboring its path to the 
root When it is retrieved, the requesting machine may recalculate 
the hashes along that path. We can use the top-most hash *as the 
GU1D to the immutable archival object, making every fragment in 
the archive completely self-verifying. 

For the user, we provide a naming syntax which explicitly in- 
corporates version numbers. Such names can be included in other 
documents as a form of permanent hyper-link. In addition, inter- 
faces will exist to examine modification history and to set version- 
ing policies [44]. Although in principle every version of every ob- 
ject is archived, clients can choose to produce versions less fre- 
quently. Archival copies are also produced when objects are idle 
for a long time or before objects become inactive. When generat- 
ing archival fragments, the floating replicas of an object participate 
together: they each generate a disjoint subset of the fragments and 
disseminate them into the infrastructure. 

To maximize the survivability of archival copies, we identify 
and rank administrative domains by their reliability and trustwor- 
thiness. We avoid dispersing all of our fragments to locations that 
have a high correlated probability of failure. Further, the number of 
fragments (and hence the durability of information) is determined 
on a per-object basis. OceanStore contains processes that slowly 
sweep through all existing archival data, repairing of increasing the 
level of replication to further increase durability. 

To reconstruct archival copies, OceanStore sends out a request 
keyed off the GUID of the archival versions. Note that we can 
make use of excess capacity to insulate ourselves from slow servers 
by requesting more fragments than we absolutely need and recon- 
structing the data as soon as we have enough fragments. As the 
request propagates up the location tree (Section 4.3), fragments are 
discovered and sent to the requester. This search has nice locality 
properties since closer fragments tend to be discovered first. 

4.6 The OceanStore API 

OceanStore draws much strength from its global scale, wide dis- 
tribution, epidemic propagation method, and flexible update pol- 
icy. The system as a whole can have rather complicated behavior. 
However, the OceanStore application programming interface (API) 
enables application writers to understand their interaction with the 
system. 

This base API provides full access to OceanStore functionality 
in terms of sessions, session guarantees, updates, and callbacks. A 
session is a sequence of reads and writes to potentially different 
objects that are related to one another through session guarantees. 
Guarantees define the level of consistency seen by accesses through 
a session. The API provides mechanisms to develop arbitrarily 
complex updates in the form described in Section 4.4. The API 
also provides a callback feature to notify applications of relevant 
events. An application can register an application-level handler to 
be invoked at the occurrence of relevant events, such as the commit 
or abort of an update. 

Applications with more basic requirements are supported 
through facades to the standard API. A facade is an interface to 
the API that provides a traditional, familiar interface. For exam- 
ple, a transaction facade would provide an abstraction atop the 
OceanStore API so that the developer could access the system 



197 




Figure 7: The Cycle of Introspection 



Event 
Handlers 



— K 




Forward 
Aggregates 
For Further 
Processing 

Issue Local 

Optimization 

Commands 



in terms of traditional transactions. The facade would simplify 
the application writer's job by ensuring proper session guarantees, 
reusing standard update templates, and automatically computing 
read sets and write sets for each update. 

Of course, OceanStore is a new system in a world of legacy code, 
and it would be unreasonable to expect the authors of existing ap- 
plications to port their work to an as yet un deployed system. There- 
fore, OceanStore provides a number of legacy facades that imple- 
ment common APIs, including a Unix file system, a transactional 
database, and a gateway to the World Wide Web. These interfaces 
exist as libraries or "plugins" to existing browsers or operating sys- 
tems. They permit users to access legacy documents while enjoy- 
ing the ubiquitous and secure access, durability, and performance 
advantages of OceanStore. 

4.7 Introspection 

As envisioned, OceanStore will consist of millions of servers 
with varying connectivity, disk capacity, and computational power. 
Servers and devices will connect, disconnect, and fail sporadically. 
Server and network load will vary from moment to moment. Man- 
ually tuning a system so large and varied is prohibitively complex. 
Worse, because OceanStore is designed to operate using the utility 
model, manual tuning would involve cooperation across adminis- 
trative boundaries. 

, To address these problems, OceanStore employs introspection, 
an architectural paradigm that mimics adaptation in biological sys- 
tems. As shown in Figure 7, introspection augments a system's 
normal operation (computation), with observation and optimiza- 
tion. Observation modules monitor the activity of a running system 
and keep a historical record of system behavior. They also employ 
sophisticated analyses to extract patterns from these observations. 
Optimization modules use the resulting analysis to adjust or adapt 
the computation. 

OceanStore uses introspective mechanisms throughout the sys- 
tem. Although we have insufficient space to describe each use in 
detail, we will give a flavor of our techniques below. 

4.7.1 Architecture 

We have designed a common architecture for introspective systems 
in OceanStore (see Figure 8). These systems process local events, 
forwarding summaries up a distributed hierarchy to form approx- 
imate global views of the system. Events include any incoming 
message or noteworthy physical measurement. Our three-point ap- 
proach provides a framework atop which we are developing spe- 
cific observation and optimization modules. 

The high event rate 13 precludes extensive online processing. In- 
stead, a level of fast event handlers summarizes local events. These 
summaries are stored in a local database. At the leaves of the hier- 
archy, this database may reside only in memory; we loosen durabil- 
ity restrictions for local observations in order to attain the necessary 
event rate. 



13 Each machine initiates and receives roughly as many messages as local 
area network files systems. In addition, the. routing infrastructure requires 
communication proportional to the logarithm of the size of the network. 



Figure 8: Fast event handlers summarize and respond to local 
events. For efficiency, the "database" may be only soft state 
(see text). Further processing analyzes trends and aggregate 
information across nodes. 



We describe all event handlers in a simple domain-specific lan- 
guage. This language includes primitives for operations like av- 
eraging and filtering, but explicitly prohibits loops. We expect 
this model to provide sufficient power, flexibility, and extensibility, 
while enabling the verification of security and resource consump- 
tion restrictions placed on event handlers. 

A second level of more powerful algorithms periodically pro- 
cesses the information in the database. This level can perform so- 
phisticated analyses and incorporate historical information, allow- 
ing the system to detect and respond to long-term trends. 

Finally, after processing and responding to its own events, a third 
level of each node forwards an appropriate summary of its knowl- 
edge to a parent node for further processing on the wider scale. The 
infrastructure uses the standard OceanStore location mechanism to 
locate that node, which is identified by its GUID. Conversely, we 
could distribute the information to remote optimization modules as 
OceanStore objects that would also be accessed via the standard 
location mechanism. 

4. 7.2 Uses of Introspection 

We use introspection to manage a number of subsystems in the 
OceanStore. Below, we will discuss several of these components. 

Cluster Recognition: Cluster recognition attempts to identify and 
group closely related files. Each client machine contains an event 
handler triggered by each data object access. This handler incre- 
mentally constructs a graph representing the semantic distance [28] 
among data objects, which requires only a few operations per ac- 
cess. 

Periodically, we run a clustering algorithm that consumes this 
graph and detects clusters of strongly-related objects. The fre- 
quency of this operation adapts to the stability of the input and 
the available processing resources. The result of the clustering 
algorithm is forwarded to a global analysis layer that publishes 
small objects describing established clusters. Like directory list- 
ings, these objects help remote optimization modules collocate and 
prefetch related files. 

Replica Management: Replica management adjusts the number 
and location of floating replicas in order to service access requests 
more efficiently. Event handlers monitor client requests and system 
load, noting when access to a specific replica exceeds its resource 
allotment. When access requests overwhelm a replica, it forwards 
a request for assistance to its parent node. The parent, which tracks 
locally available resources, can create additional floating replicas 
on nearby nodes to alleviate load. 

Conversely, replica management eliminates floating replicas that 
have fallen into disuse. Notification of a replica's termination also 



198 



propagates to parent nodes, which can adjust that object's dissemi- 
nation tree. 

In addition to these short-term decisions, nodes regularly analyze 
global usage trends, allowing additional optimizations. For exam- 
ple, OceanStore can detect periodic migration of clusters from site 
to site and prefetch data based on these cycles. Thus users will find 
their project files and email folder on a local machine during the 
work day, and waiting for them on their home machines at night. 

Other Uses: OceanStore uses introspective mechanisms in many 
other aspects as well. Specifically, introspection improves <he man- 
ageability and performance of the routing structure, enabies con- 
struction of efficient update dissemination trees, ensures the avail- 
ability and durability of archival fragments, identifies unreliable 
peer organizations, and performs continuous confidence estimation 
on its own optimizations in order to reduce harmful changes and 
feedback cycles. 

5 STATUS 

We are currently implementing an OceanStore prototype that we 
will deploy for testing and evaluation. The system is written in 
Java with a state machine-based request model for fast IfO [22]. 
Initially, OceanStore will communicate with applications through 
a UNIX file system interface and a read-only proxy for the World 
Wide Web in addition to the native OceanStore API. 

We have explored the requirements that our security guarantees 
place on a storage architecture. Specifically, we have explored dif- 
ferences between enforcing read and write permissions in an un- 
trusted setting, emphasizing the importance of the ability of clients 
to validate the correctness of any data returned to them. This ex- 
ploration included not only checking the integrity of the data itself, 
but also checking that the data requested was the data returned, and 
that all levels of metadata were protected as strongly as die data 
itself. A prototype cryptographic file system provided a testbed for 
specific security mechanisms. 

A prototype for the probabilistic data location component has 
been implemented and verified. Simulation results show that our 
algorithm finds nearby objects with near-optimal efficiency. 

We have implemented prototype archival systems that use both 
Reed-Solomon and Tornado codes for redundancy encoding. Al- 
though only one half of the fragments were required to recon- 
struct the object, we found that issuing requests for extra fragments 
proved beneficial due to dropped requests. 

We have implemented the introspective prefetching mechanism 
for a local file system. Testing showed that the method correctly 
captured high-order correlations, even in the presence of noise. We 
will combine that mechanism with an optimization module appro- 
priate for the wide-area network. 

6 RELATED WORK 

Distributed systems such as Taos [52] assume untrusted networks 
and applications, but rely on some trusted computing base. Crypto- 
graphic file systems such as Blaze's CFS [5] provide end-to-end se- 
crecy, but include no provisions for sharing data, nor for protecting 
integrity independently from secrecy. The Secure File System [24] 
supports sharing with access control lists, but fails to provide in- 
dependent support for integrity, and trusts a single server to dis- 
tribute encryption keys. The Farsite project [8] is more similar to 
OceanStore than these other works, but while it assumes the use of 
untrusted clients, it does not address a wide-area infrastructure. 

SDSI [1] and SPK1 [15] address the problem of securely dis- 
tributing keys and certificates in a decentralized manner. Policy- 



Maker [6] deals with the description of trust relations. Mazieres 
proposes self-certifying paths to separate key management from 
system security [35]. 

Bloom filters [7] are commonly used as compact representa- 
tions of large sets. The R* distributed database [33] calculates 
them on demand to implement efficient semijoins. The Summary 
Cache [16] pushes Bloom filters between cooperating web caches, 
although their method does not scale well in the number of caches. 

Distributing data for performance, availability, or survivability 
has been studied extensively in both the file systems and database 
communities. A summary of distributed file systems can be found 
in [31]. In particular, Bayou [13] and Coda [26] use replication 
to improve availability at the expense of consistency and intro- 
duce specialized conflict resolution procedures. Sprite [36] also 
uses replication and caching to improve availability and perfor- 
mance, but has a guarantee of consistency that incurs a performance 
penalty in the face of multiple writers. None of these systems ad- 
dresses the range of security concerns that OceanStore does, al- 
though Bayou examines some problems that occur when replicas 
are corrupted [48]. 

Gray et. al. argue against promiscuous replication in [19]. 
OceanStore differs from the class of systems they describe because 
it does not bind floating replicas to specific machines, and it does 
not replicate all objects at each server. 

OceanStore 's second tier of floating replicas are similar to trans- 
actional caches; in the taxonomy of [17] our algorithm is detection- 
based and performs its validity checks at commit time. In contrast 
to similar systems, our merge predicates should decrease the num- 
ber of transactions aborted due to out-of-date caches. 

Many previous projects have explored feedback-driven adapta- 
tion in extensible operating systems [45], databases [11], file sys- 
tems [34], global operating systems [9], and storage devices [51]. 
Although these projects employ differing techniques and terminol- 
ogy, each could be analyzed with respect to the introspective model. 

The Seer project formulated the concept of semantic dis- 
tance [28] and collects clusters of related files for automated hoard- 
ing. Others have used file system observation to drive automatic 
prefetching [20, 27]. 

Introspective replica management for web content was examined 
in AT&T's Radar project [41], which considers read-only data in a 
trusted infrastructure. The Mariposa project [46] addresses inter- 
domain replication with an economic model. Others optimize com- 
munication cost when selecting a new location for replica place- 
ment [2] within a single administrative domain. 

Similar to OceanStore, the Intermemory project [18] uses 
Cauchy Reed-Solomon Codes to archive wide scale durability. We 
anticipate that our combination of active and archival object forms 
will allow greater update performance while retaining Intermem- 
ory's survivability benefits. 

7 CONCLUSION 

The rise of ubiquitous computing has spawned an urgent need for 
persistent information. In this paper we presented OceanStore, a 
utility infrastructure designed to span the globe and provide secure, 
highly available access to persistent objects. 

Several properties distinguish OceanStore from other systems: 
the utility model, the untrusted infrastructure, support for truly no- 
madic data, and use of introspection to enhance performance and 
maintainability. A utility model makes the notion of a global sys- 
tem possible, but introduces the possibility of untrustworthy servers 
in the system. To this end, we assume that servers may be run by 
adversaries and cannot be trusted with cleartext; as a result, server- 
side operations such as conflict-resolution must be performed di- 



199 



fectly on encrypted information. Nomadic data permits a wide 
range of optimizations for access to information by bringing it 
"close" to where it is needed, and enables rapid response to re- 
gional outages and denial-of-service attacks. These optimizations 
are assisted by introspection, the continuous online collection and 
analysis of access patterns. 

OceanStore is under construction. This paper presented many 
of the design elements and algorithms of OceanStore; several have 
been implemented. Hopefully, we have convinced the reader that 
an infrastructure such as OceanStore is possible to construct; that it 
is desirable should be obvious. 

8 ACKNOWLEDGEMENTS 

We would like to thank the following people who have been instru- 
mental in helping us to refine our thoughts- about OceanStore (in 
alphabetical order): William Bolosky, Michael Franklin, Jim Gray, 
James Hamilton, Joseph Hellerstein, Anthony Joseph, Josh Mac- 
Donald, David Patterson, Satish Rao, Dawn Song, Bill Tetzlaff, 
Doug Tygar, Steve Weis, and Richard Wheeler. 

In addition, we would like to acknowledge the enthusiastic sup- 
port of our DARPA program manager, Jean Scholtz, and industrial 
funding from EMC and IBM. 

9 REFERENCES 

[I] M. Abadi. On SDSI's linked local name spaces. In Proc. of 
IEEE CSFW, 1997. 

[2] S. Acharya and S. B. Zdonik. An efficient scheme for dynamic 
data replication. Technical Report CS-93-43, Department of 
Computer Science, Brown University, 1993. 

[3] T. Anderson, M. Dahlin, J. Neefe, D. Patterson, D. Roselli, 
and R. Wang. Serverless Network File Systems. In Proc. of 
ACMSOSP, Dec. 1995. 

[4] B. Barak, A. Herzberg, D. Naor, and E. Shai. The proactive 
security toolkit and applications. In Proc. of ACM CCS Conf, 
pages 18-27, Nov. 1999. 

[5] M. Blaze. A cryptographic file system for UNIX. In Proc. of 
ACM CCS Conf, Nov. 1 993. 

[6] M. Blaze, J. Feigenbaum, and J. Lacy. Decentralized trust 
management. In Proc. of IEEE SRSP, May 1996. 

[7] B. Bloom. Space/time trade-offs in hash coding with allow- 
able errors. In Communications of the ACM, volume 13(7), 
pages 422-426, July 1970. 

[8] W. Bolosky, J. Douceur, D. Ely, and M. Theimer. Feasibility 
of a serverless distributed file system deployed on an existing 
set of desktop pes. In Proc. of Sigmetrics, June 2000. 

[9] W. Bolosky, R. Draves, R. Fitzgerald, C. Fraser, M. Jones, 
T. Knoblock, and R. Rashid. Operating systems directions for 
the next millennium. In Proc. ofHOTOS Conf, May 1 997. 
[10] M. Castro and B. Liskov. Practical Byzantine fault tolerance. 
In Proc. ofUSENIXSymp. on OSDI, 1 999. 

[II] S. Chaudhuri and V. Narasayya. Auto Admin "what-if ' index 
analysis utility. In Proc. of ACM SIGMOD Conf, pages 367- 
378, June 1998. 

[12] M. Dahlin, T. Anderson, D. Patterson, and R. Wang. Coopera- 
tive caching: Using remote client memory to improve file sys- 
tem performance. In Proc. ofUSENIXSymp. on OSDI, Nov. 
1994. 

[13] A. Demers, K. Petersen, M. Spreitzer, D. Terry, M. Theimer, 
and B. Welch. The Bayou architecture: Support for data shar- 
ing among mobile users. In Proc. of IEEE Workshop on Mo- 
bile Computing Systems & Applications, Dec. 1994. 



[14] W. Edwards, E. Mynatt, K. Petersen, M. Spreitzer, D. Terry, 
and M. Theimer. Designing and implementing asynchronous 
collaborative applications with Bayou. In Proc. ofACMSymp. 
on User Interface Software & Technology, pages 119-128, 
1997. 

[15] C. Ellison, B. Frantz, R. Rivest, B. Thomas, and T. Ylonen. 
SPKI certificate theory. RFC 2693, 1999. 

[16] L. Fan, P. Cao, J. Almeida, and A. Broder. Summary cache: 
A scalable wide-area Web cache sharing protocol. In Proc. of 
ACM SIGCOMM Conf, pages 254-265, Sept. 1 998. - 

[17] M. Franklin, M. Carey, and M. Livny. Transactional client- 
server cache consistency: Alternatives and performance. 
ACM Transactions on Database Systems, 22(3):3 15-363, 
Sept. 1997. 

[18] A. Goldberg and P. Yianilos. Towards an archival intermem- 
ory. In Proc, of IEEEADL, pages 147-156, Apr. 1998. 

[19] J. Gray, P. Helland, P. O'Neil, and D. Shasha. The dangers of 
replication and a solution. In Proc. of ACM SIGMOD Conf, 
volume 25, 2, pages 1 73-1 82, June 1 996. : 

[20] J. Griffioen and R. Appleton. Reducing file system latency us- 
ing a predictive approach. In Proc. of USENIX Summer Tech- 
nical Conf, June 1994. 

[21] E. Hagersten, A. Landin, and S. Haridi. DDM — A Cache- 
only Memory Architecture. IEEE Computer, Sept. 1992. 

[22] J. Hill, R. Szewczyk, A. Woo, D. Culler, S. Hollar, and K. Pis- 
ter. System architecture directions for networked sensors. In 
Proc. ofASPLOS, Nov. 2000. 

[23] J. Howard, M. Kazar, S. Menees, D. Nichols, M. Satya- 
narayanan, R. Sidebotham, and M. West. Scale and perfor- 
mance in a distributed file system. ACM Transactions on 
Computer Systems, 6(1):51-81, Feb. 1988. 

[24] J. Hughes, C. Feist, H. S, M. O'Keefe, and D. Corcoran. A 
universal access, smart-card-based secure file system. In Proc. 
of the Atlanta Linux Showcase, Oct. 1999. 

[25] L. Kawell, S. Beckhardt, T. Halvorsen, R. Ozzie, and I. Greif. 
Replicated document management in a group communication 
system. In Proc. ofACMCSCWConf, Sept. 1988. 

[26] J. Kistler and M. Satyanarayanan. Disconnected operation in 
the Coda file system. ACM Transactions on Computer .Sys- 
tems, 10(1 ):3-25, Feb. 1992. 

[27] T. Kxoeger and D. Long. Predicting file-system actions from 
prior events. In Proc. of USENIX Winter Technical Conf, 
pages 319-328, Jan. 1996. 

[28] G. Kuenning. The design of the seer predictive caching sys- 
tem. In Proc. of IEEE Workshop on Mobile Computing Sys- 
tems & Applications, Dec. 1 994. 

[29] H. Kung and J. Robinson. On optimistic methods for con- 
currency control. ACM Transactions on Database Systems, 
6(2):2 13-226, June 1981. 

[30] L. Lamport, R. Shostak, and M. Pease. The byzantine gener- 
als problem. ACM TOPLAS, 4(3):382-^0 1 , 1 982. 

[31] E. Levy and A. Silberschatz. Distributed file systems: Con- 
cepts and examples. ACM Computing Surveys, 22(4):321- 
375, Dec. 1990. 

[32] M. Luby, M. Mitzenmacher, M. Shokrollahi, D. Spielman, 
and V. Stemann. Analysis of low density codes and improved 
designs using irregular graphs. In Proc. of ACM STOC, May 
1998. 

[33] L. Mackert and G. Lohman. R* optimizer validation and per- 
formance for distributed queries. In Proc. of Intl. Conf on 
VLDB, Aug. 1986. 



200 



[34] J. Matthews, D. Roselli, A. Costello, R. Wang, and T. Ander- 
son. Improving the performance of log-structured file systems 
with adaptive methods. In Proc. of ACM SOSP, Oct. 1997. 

[35] D. Mazieres, M. Kaminsky, F. Kaashoek, and E. Witchel. Sep- 
arating key management from file system security. In Proc. of 
ACMSOSP, 1999. 

[36] M. Nelson, B. Welch, and J. Ousterhout. Caching in the sprite 
network file system. IEEE/ACM Transactions on Networking* 
6(1): 134-1 54, Feb. 1988. 

[37] NIST. FIPS 186 digital signature standard. May 1994. 

[38] D. Norman. The Invisible Computer, pages 62-63. MIT Press, 
Cambridge, MA, 1999. 

[39] J. Plank. A tutorial on reed-solomon coding for fault- 
tolerance in raid-like systems. Software Practice and Expe- 
rience, 27(9):995-1012, Sept 1997. 

[40] C. Plaxton, R. Rajaraman, and A. RicnaT Accessing nearby 
copies of replicated objects in a distributed environment. In 
Proc. of ACM SPAA, pages 31 1-320, Newport, Rhode Island, 
June 1997. 

[41] M. Rabinovich, I. Rabinovich, R. Rajaraman, and A. Aggar- 
wal. A dynamic object replication and migration protocol for 
an internet hosting service. In Proc. of IEEE ICDCS, pages 
101-1 13, June 1999. 

[42] R. Rivest and B. Lampson. SDSI — A simple distributed secu- 
rity infrastructure. Manuscript, 1996. 

[43] R. Sandberg, D. Goldberg, S. KJeiman, D. Walsh, and 
B. Lyon. Design and implementation of the Sun Network 
Filesystem. In Proc. of USENIX Summer Technical Conf, 
June 1985. 



[44] D. Santry,M. Feeley, N. Hutchinson, A. Veitch, R. Carton, 
and J. Ofir. Deciding when to forget in the Elephant file sys- 
tem. In Proc. of ACM SOSP, Dec. 1 999. 

[45] M. Seltzer and C. Small. Self-monitoring and self-adapting 
operating systems. In Proc. ofHOTOSConf, pages 124-129, 
May 1997. 

[46] J. Sidell, P. Aoki, S. Barr, A. Sah, C. Staelin, M. Stonebraker, 
and A. Yu. Data replication in Mariposa. In Proc. of IEEE 
ICDE, pages 485-495, Feb. 1996. 

[47] D. Song, D. Wagner, and A. Perrig. Search on encrypted data. 
To be published in Proc. of IEEE SRSP, May 2000. 

[48] M. Spreitzer, M. Theimer, K. Petersen, A. Demers, and 
D. Terry. Dealing with server corruption in weakly consis- 
tent, replicated data systems. In Proc. of ACM/IEEE Mobi- 
Com Conf, pages 234-240, Sept. 1997. 

[49] M. Stonebraker. The design of the Postgres storage systeni/in 
Proc. of Intl. Conf on VLDB, Sept 1987. 

[50] M. Weiser. The computer for the twenty-first centry. Scientific 
American, Sept 1991. 

[51] J. Wilkes, R. Golding, C. Staelin, and T. Sullivan. The HP 
AutoRAID hierarchical storage system. ACM Transactions on 
Computer Systems, pages 108-136, Feb. 1996. 

[52] E. Wobber, M. Abadi, M. Burrows, and B. Lampson. Authen- 
tication in the Taos operating system. In Proc. of ACM SOSP, 
pages 256-269, Dec. 1993. 



201 



